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FROM THE EDITOR 



This book begins a new effort by BYTE Publications to provide our readers with the best 
available manuscripts on the major topics of interest to the home computerist Included in the 
new series of BYTE's books are reprints of the best articles from past issues of BYTE magazine, 
plus new materia! which has not been printed anywhere before. The books wiil be organized in 
logical volumes of related topics. This provides the reader with vital information from previous 
BYTE issues which he or she might have missed, new material that has not appeared in the 
magazine, plus a book covering one specific theme for quick, easy reference. 

Manuscripts included in these books are of the same high quality as those found in BYTE 
magazine, because we use the same stringent criteria in selecting new manuscripts for inclusion 
in these books as we do in choosing them for the magazine. Generally, the additional criterion 
used to select manuscripts for the books instead of the magazine is a constraint on the length of 
articles used in the magazine itself. In addition, we receive so many quality manuscripts that we 
could never possibly include them all in BYTE magazine. Therefore, in our efforts to give the 
reader all the information needed to be a successful microcomputerist, we have decided to 
make these manuscripts available in book form. 

The book that you are holding in your hands is the first in a series on the general topic of 
Programming Techniques. This particular book deals with the details of the theory behind the 
design of the various aspects of programs. Anyone who has programmed for any length of time 
wiil agree that the most critical part of writing a program of any kind (application, system 
software, etc) is in the design phase, both the initial specifications and the program logic design. 
The actual coding of the program amounts to more of a mechanical process once the initial 
design of the program has taken place. Therefore, it is easy to see that unless the original design 
of the program is correct, the program cannot be expected to work as per specifications. 

The purpose of this book, then, is to provide the personal computer user with the techniques 
needed to design efficient, effective, maintainable programs. Included in the topics covered are 
structured program design, modular programming techniques, program logic design, and 
examples of some of the more common traps the casual, as well as the experienced, program- 
mer may fall into. In addition, details on various aspects of the actual program functions, such 
as hashed tables and binary tree processing, are included. 

Further books in this series will make available new techniques and further developments of 
the existing ones as they occur. This will allow you, the personal computer user, to stay up to 
date with the current technology of programming skills. 

Blaise W. Liffick 
Editor 



PROGRAM STRUCTURE 



About This Section 



For the last several years, those of us whose profession has been programming (applications, 
systems, scientific, whatever) have been bombarded on all sides with the latest philosophies of 
programming: structured, modular, top down, bottom up, GOTOiess, etc. Not only do we get 
encouragement from employers to embrace whatever the most current popular technique for 
coding is, but we also get it from others in our profession who are adherents to one or the other 
philosophy. This is not to imply that any or ail of the techniques do not have merit, but most 
of the coding philosophers are talking only about just that: coding. The main thrust of their 
basic arguments is against poor coding practices. And that's just fine. But they forgot one 
important detail: once a program has been designed, all the coding techniques in the world are 
generally ineffective because the major portion of the program logic has already been set! The 
specifications and initial design of the program predetermines to a great extent how the coding 
can be performed. 

In the following section, the techniques fax designing effective programs are presented. Both 
the amateur and professional programmer will profit from these practical techniques of design 
by being able to produce essentially error-free code. And for the amateur programmer there is 
an added bonus for following these practices: instant documentation! By carefully designing 
the function of the program before ever coding a single line, you insure that once the coding is 
completed, you can add something at any time. The code written so long ago will be easily 
understood, and you will know where and how to make any necessary changes. 

In addition, if everyone followed similar guidelines for designing programs, trading programs 
would be a painless and easy way to expand your program library. You could instantly under- 
stand what anyone else's program was doing. And while someone's 6800 code definitely will 
not run on your 8080 machine, the program has already been totally designed and can be easily 
coded into any other language! 



Structured Program Design 



David A Higgins 



In the world of electronics, no experi- 
menter in his right mind would build a cir- 
cuit by throwing a few parts together with 
some wire and some hope, then attaching a 
line cord and plugging it in to see if it works. 
Not only are you likely to destroy some 
very expensive parts, but it is also a good 
way to get fried, or at least get a new hairdo. 

Yet, after all the trouble that a serious 
microcomputer hobbyist will take to insure 
that his circuit is put together correctly be- 
fore he ever turns it on, he will invariably 
try to program his new computer by using 
a technique analogous to the one above. 
That is why his programs almost never 
run right the first time, if indeed they ever 
manage to run right at all. It is also why 
many microcomputer buffs stay up until 



odd hours of the night drinking coffee by 
the gallon in an effort to find that one 
little bug. 

But there is hope. I'm sure that nearly 
everyone involved with computers has heard 
something about structured programming in 
one form or another. It is not really a new 
technique, having been preached about for 
many years. However, the tools and meth- 
odologies available to design programs have 
changed radically over the years. 

In the beginning there were flowcharts, 
which looked like five-dimensional octopi 
or the corporate structure of a conglom- 
erate. Despite the absence of a consistent 
approach that would enable everyone to 
design a program using flowcharts, those 
programmers who did bother to work out 



Figure i: The Warnier-Orr 
diagram showing the basic 
structure of the BUG pro- 
gram. 
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their problem with a flowchart first usually 
seemed to have more luck in getting pro- 
grams to run sooner and better than 
programmers who did not 

Structuring Tools 

The development of mathematics would 
surely have been stymied if Roman numerals 
had been retained as our number system. In 
much the same manner, the science of 
structured program design would have been 
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Figure 2: Diagram of the logic for the PLAYER and COMPUTER TURNS 
routines of the BUG program. Note that item means (t the complement of item. " 



mired down if only flowcharts had oeen 
available for developing programs. It is not 
that calculus is impossible with Roman 
numerals, it's just that it's extremely dif- 
ficult. Thus, over the years, a number of 
design and documentation tools were devel- 
oped to better enable a programmer to 
understand the problem before going out to 
do battle with the program. 

TOP-DOWN or GOTO-less programming, 
developed by Dijkstra and others, was pro- 
bably the first major attempt to solve the 
design versus coding problem. Dijkstra sim- 
ply observed that the more GOTOs that 
were in a program, the less likely it was to 
run correctly. Dijkstra called such programs 
"spaghetti bowl" programs, because if you 
drew a line from each GOTO in the program 
to its destination, you ended up with a mess 
that looked like a bowl of spaghetti. He 
showed how any program could be written 
with just a few simple flow structures 
wiihout any GOT^s. nis tecnmCjucs pro- 
duced simple, readable code that was easy 
to test and maintain. So, the big push among 
design aficionados was to eliminate the 
GOTOs in their programming. Although 
TOP-DOWN programming was a big advance- 
ment over flowcharting, it was just that: 
programming, it was a technique for coding 
a program, not necessarily designing it. 

Another technique, IBM's HIPO (and 
later HIPO-DB) entered the design field 
almost by chance, being primarily a docu- 
mentation tool that was also being used 
for program design. The major drawback to 
HIPO techniques, besides the fact that they 
did not work well for designing a program, 
was their tendency to produce 50 pages of 
documentation for a three page program. 



Warnier-Orr Diagrams - A New Approach 

Within the last four years a new tech- 
nique for program design has evolved from 
the work of Jean-Dominque Warnier (pro- 
nounced wam'-yay) in France, and Kenneth 
T Orr of Langston, Kitch and Associates in 
Topeka KS. The technique has foundations 
in set theory and Boolean algebra, and holds 
much promise for program design appli- 
cations. Warnier-Orr diagrams, as we have 
called them here in the United States, allow 
programmers to design faster than ever 
before, to code programs with little or no 
effort, and produce programs that usually 
run correctly the first time. The approach 
is not limited to small programs. Nothing 
will make a believer out of someone quicker 
than a 20 page COBOL program which runs 
correctly the first time. The Warnier-Orr 
technique stresses design over coding and 
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contends that once a problem is designed, it 
does not matter what programming language 
you code it in! At Langston, Kitch and Asso- 
ciates, people have used the technique 
to program in COBOL, PL/!, ALGOL, 
FORTRAN, BASIC, RPGI1 and assembler 
languages. It works equally well for all of 
them. 



Warnier-Orr Diagram 

The simplest way to learn about Warnier- 
Orr diagrams is to see examples of them. 
Warnier-Orr diagrams are very easy to learn 
and use; however, be forewarned that this is 
a technique that is sometimes deceptively 
simple, but not as trivial as it often seems. 

Let's consider the relatively simple game 
of BUG. In this game the computer rolls a 
die, once for itself and once for its oppo- 
nent. Each number of the die corresponds to 
a part of the BUG's anatomy: 1 = BODY, 
2 - NECK, 3 = HEAD, 4 = ANTENNAE, 
5 = TAI L, and 6 = LEGS. The object of the 
game is to finish your bug before the com- 
puter finishes its bug. Other rules: you must 
have a body before you can have legs, a neck 
or a tail; you must have a neck before you 
can have a head, and you must have a head 
before you can have antennae. One body, 
one neck, one head, one tail, six legs and 
two antennae are needed to complete a bug. 
Figure 1 is a Warnier-Orr diagram showing 
the basic structure of the BUG program. 

The Warnier-Orr diagram is read left to 
right, top to bottom, just like conventional 
English text. The brackets enclose logically 
related operations, the largest of which is the 
program itself. The BUG program is com- 
posed of three logical sections: 

• The BEGIN PROGRAM section, 
where the player's name is requested 
and there is an explanation of the 
game rules. Note that the ©symbol 
between the modules YES and NO 
denotes the exclusive OR function, 
meaning that one or the other but not 
both of the modules will be per- 
formed. Observe also that this is re- 
flected in the number of times that 
each module may be performed: if 
the condition is false and 1 if the 
condition is true. 

• The process section, GAMES, where 
the playing of the game actually takes 
place. The (1,g) denotes that the sec- 
tion is to be performed at least once, 
and possibly many (g) times. 

• The END PROGRAM section, which 
in this case is empty, but which 
usually contains things such as the 
closing of files, the goodbye message, 
etc. 



The rest of the brackets decompose in a 
similar fashion. The GAMES procedure 
breaks down into the beginning of the game, 
(BEGIN GAME), the turns that each player 
takes (TURNS), and the end of the game 
(ENDGAME). 

Notice that logically there are things that 
only happen at the beginning of the program 
and things that only happen during the play- 
ing of the game itself. The Warnier-Orr di- 
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Listing 1: A structured BASIC program that was written using the Warnier- 
Orr diagrams of figures 1 thru 3, This code executed correctly the first time 
even though it was the author's first attempt at writing a BASIC program, 

10 REM BUG PROGRAM 

20 REM BEGIN PROGRAM 

30 DIM HEAD(2>, BODYI2), LEGS{2), TAIL<2>, ANTE12), NECK(2>, CNT(2) 

40 GOSUB120 

50 REM GAMES (1,G) 

60 LET EPGM-0 

70 GOSUB200 

80 IF EPGM-0 THEN GOTO 70 

90 REM END PROGRAM 

100 STOP 

110 REM BEGIN PROGRAM SUBROUTINE 

120 PRINT 'ENTER YOUR FIRST NAME* 

130 INPUT :NAME$ 

140 PRINT DO YOU WANT AN EXPLANATION OF THE RULES; ENTER YES 

OR NO/ 

150 INPUT ANS$ 

155 LET TEST - SCOMP CYES\ANS$) 

160 IF TEST *0 THEN GOSUB 1200 ELSE ; 

170 RETURN 

180 REM GAMES SUBROUTINE 

190 REM BEGIN GAME 

200 GOSUB 290 

210 REM TURNS (1J} 

220 LETEGAM*0 

230 GOSUB 390 

240 IF EGAM*0 THEN 230 

250 REM END GAME 

260 GOSUB 1150 

270 RETURN 

280 REM BEGIN GAME SUBROUTINE 

290 LET BODY(1), BODY(2) * 

295 LETCNT(1),CNT(2) -0 

300 LETNECK(1LNECK(2) *0 
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Listing I t continued: 



310 LET HEADM j, HEAD(2) - 

320 LET ANTE(1) # ANTEI2J =0 

330 LETTA1L(1),TAIL(2) =0 

340 LET LEGSH), LEGS<2> = 

350 RETURN 

360 REM TURNS SUBROUTINE 

370 REM PLAYERS TURN 

380 REM LET PLAYER START TURN 

390 PRINT 'HIT RETURN TO ROLL DIE' 

400 INPUT A 

410 LET PLAY = 1 

420 GOSUB 520 

430 REM COMPUTERS TURN 

440 LET PLAY -2 

450 GOSUB 520 

460 REM END TURN 

470 GOSUB 1060 

480 RETURN 

490 REM TURN SUBROUTINE 

500 REM PLAY=1 ;PLAYERS TURN-PLAY=2;COMPUTERS TURN 

510 REM ROLL DIE 

520 LET ROLL = FIX@ (({RND (OH * 6.0}) + 1 

530 PRINT ; 'ROLL IS A ", ROLL 

540 IF ROLL = 1 THEN IF BODY (PLAY) #1 THEN GOSUB 690 ELSE ; ELSE ; 

550 IF ROLL=1 THEN 650 

560 IF ROLL = 2 THEN IF BODY (PLAY) - 1 THEN IF NECK (PLAY) # 1 

THEN GOSUB 760 

570 IF ROLL=2 THEN 650 

580 IF ROLL = 3THEN IF BODY (PLAY) - 1 THEN IF NECK (PLAY) - 1 

THEN IF HEAD (PLAY) #1 THEN GOSUB 820 

590 IFROLL=3THEN650 

600 IFROLL-4THENIFHEAD(PLAY) = 1 THEN IF ANTE (PLAY) #2 

THEN GOSUB 880 

610 IF ROLL=4 THEN 650 

620 IF ROLL = 5 THEN IF BODY (PLAY) = \ THEN IF TAIL (PLAY) # 1 

THEN GOSUB 940 

630 IF ROLL=5 THEN 650 

640 IF ROLL = 6 THEN IF BODY (PLAY) = 1 THEN IF LEGS(PLAY) 

#6 THEN GOSUB 1000 

650 RETURN 

670 REM BODY SUBROUTINE 

700 IF PLAY = 1 THEN PRINT : NAME$, " 'S BUG HAS A HEAD" 

710 IF PLAY - 2 THEN PRINT : "COMPUTER'S BUG HAS A HEAD" 

720 LETCNT (PLAY) = 1 

730 LET BODY (PLAY) = 1 

740 RETURN 

750 REM NECK SUBROUTINE 

760 IF PLAY = 1 THEN PRINT : NAME$, " 'S BUG HAS A NECK" 

770 IF PLAY = 2 THEN PRINT : "COMPUTER'S BUG HAS A NECK" 

780 LET CNT (PLAY) = CNT (PLAY) + t 

790 LET NECK (PLAY) = 1 

800 RETURN 

810 REM HEAD SUBROUTINE 

820 IF PLAY = 1 THEN PRINT : NAME$, " 'S BUG HAS A BODY" 

830 IF PLAY = 2 THEN PRINT : "COMPUTERS BUG HAS A BODY" 

840 LET CNT (PLAY) = CNT (PLAY) + 1 

850 LET HEAD (PLAY) = 1 

860 RETURN 

870 REM ANTENNAE SUBROUTINE 

880 LET ANTE(PLAY) = ANTE(PLAY) + 1 

890 IF PLAY = 1 THEN PRINT : NAMES, " 'S BUG HAS ", 

ANTE (1), "ANTENNAE." 

900 IF PLAY = 2 THEN PRINT : "COMPUTER'S BUG HAS", ANTE (2) 

"ANTENNAE." 

910 LET CNT (PLAY) = CNT (PLAY* + 1 

920 RETURN 

930 REM TAIL SUBROUTINE 

940 IF PLAY = 1 THEN PRINT : NAMES, " 'S BUG HAS A TAIL" 

950 IF PLAY -2 THEN PRINT: "COMPUTER'S BUG HAS A TAIL" 

960 LET CNT (PLAY) - CNT (PLAY) + 1 

970 LET TAIL (PLAY) = 1 

980 RETURN 

990 REM LEGS SUBROUTINE 

1000 LET LEGS(PLAY) = LEGS(PLAY) + 1 

1010 IF PLAY = 1 THEN PRINT : NAMES, "'S BUG HAS", LEGS (1), " LEGS/' 

1020 IF PLAY = 2 THEN PRINT : "COMPUTER'S BUG HAS ", LEGS (2), 

"LEGS." 

1030 LET CNT (PLAY) -CNT (PLAY) + 1 

1040 RETURN 

1050 REM END TURN SUBROUTINE 

1060 IF CNT (1) = 12 THEN 1090 

1070 IF CNT (2)- 12 THEN 1110 

1080 GOTO 1130 



agrams allow you to see very easily just 
where and when a particular event must take 
place. After examining figure 1 carefully to 
make sure that you understand how the 
diagrams work, move on to the explanation 
of the PLAYER and COMPUTER TURNS 
section shown in figure 2. 

In figure 2, we have represented the logic 
for each of the players' turns during the 
game. At the beginning of each turn, the die 
is rolled to determine the part of the BUG's 
body that the player may receive. Whatever 
the roll, we then have a logical path to 
follow. Again, please note that the presence 
of the © between each of the possible 
rolls denotes mutual exclusion, ie: only one 
of the paths may be selected. This partic- 
ular structure is known as a case statement. 

If the player rolls a 4, we first find the 
instructions to follow for a roll of 4 and 
check to see if the player has a BUG head. 
If he does, we then check to see whether or 
not the player already has two antennae. 
If he does, then we do nothing. If he does 
not have two antennae yet, we give him 
one antenna. If he does not have a BUG 
head, then again we do nothing. In a similar 
fashion, all of the possible rolls and their 
associated procedures are explained. Now 
let's move on to the Warnier-Orr diagram for 
the end of the turn, which is shown in figure 
3. 

If either player has won the game at the 
end of a turn, the computer declares the 
winner and ends the game. If neither player 
has won, the computer does nothing and 
cycles through for another turn. 



Structured Programming 

Having fully understood the problem, 
coding the BUG program is a simple and 
straightforward process. For this particular 
example I coded the program shown in list- 
ing 1 in a version of BASIC. 

As you can see, each bracket of the 
original Warnier-Orr diagram roughly corre- 
sponds to a subroutine in the finished code: 
the process GAMES, for instance, becomes 
the subroutine at line number 180 which is 
called repeatedly by the branch at line 80 
until EPGM equals 1, indicating that no 
more games are to be played; the process 
BEGIN PROGRAM is handled by the sub- 
routine at line 110, and so forth. 
The resultant code is: 

• easy to read and understand 

• easy to change and maintain 

• already documented 

• logically correct. 

It is also a program that will run correctly 
the first time, barring unforeseen syntax 
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errors for those of us who can't type or 
spell. All of this is possible because the 
program was thoroughly designed before 
it was even partially coded. 

Conclusion 

Warnier-Orr diagrams are a giant leap in 
the right direction for structured program- 
ming. They represent an attitude which, for 
the first time since people have been playing 
with computers, can lead to consistently 
reliable software that is very easy to main- 
tain. Currently, most data processing de- 
partments spend over 80% of their time 
and effort repairing old code that has 
suddenly gone bad. Warnier-Orr diagrams 
also provide the means to produce software 
of a quality that has never before been 
possible. 

If you think that you are interested in 
using Warnier-Orr diagrams to help you 
solve some of your software headaches, by 
all means try them. But as I mentioned 
above, this technique looks deceptively 
simple, and you may not have much success. 
Understanding a diagram such as the one 
presented in this text is one thing; creating 
one from scratch is another. 

If you do get bogged down, please feel 
free to write us for more information. If you 
try them, like them, and think you've done 
something exciting with them, again feel free 
to write us and tell us what you've done » 



Listing 7, continued: 

1090 PRINT : NAME$, " 'S BUG IS FINISHED' YOU WIN" 

1100 GOTO 1120 

1110 PRINT : "COMPUTER'S BUG IS FINISHED, I WIN" 

1120 LETEGAM = 1 

1130 RETURN 

1140 REM END GAME SUBROUTINE 

1150 PRINT: "DOES ANYONE ELSE WANT TO PLAY" 

1160 INPUT ANS$ 

1165 LET TEST = SCOMP (ANS$, 'YES') 

1170 IF TEST *0 THEN LET EPGM = 1 

1180 RETURN 

1190 REM EXPLANATION OF RULES SUBROUTINE 

1200 PRINT "THE GAME OF BUG IS PLAYED AS FOLLOWS:" 

1210 PRINT "A DIE IS ROLLED BY THE COMPUTER, AND EACH NUMBER" 

1220 PRINT "ON THE DIE CORRESPONDS TO A PART OF THE BUG'S " 

1230 PRINT " BODY: 1=BODY, 2-NECK, 3=HEAD,4=ANTENNAE, 5=TAIL" 

1240 PRINT " 6=LEGS. YOU NEED 1 BODY, 1 NECK, 1 HEAD, 2 ANTENNAE" 

1250 PRINT " 1 TAIL, AND 6 LEGS TO COMPLETE A BUG." 

1260 PRINT " THE OBJECT OF THE GAME IS TO BUILD YOUR BUG 

BEFORE" 

1270 PRINT "COMPUTER BUILDS HIS." 

1280 PRINT "-HIT RETURN WHEN YOU ARE READY TO PLAY " 

1290 INPUT A 

1210 RETURN 
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Structured Programming 

Willi vVfirnBGr-UiT 
Diagrams 



David A Higgins 



Part 1 : Design Methodology 



Any successful program design method- 
ology must be able to do several things: it 
must produce consistent, low cost, high 
reliability results; it must produce them 
quickly, while still allowing for easy mainte- 
nance later and, it must be simple enough to 
allow anyone (and I do mean anyone) to use 
it. Warnier-Orr diagrams (after Jean- 
Dominique Warnier in France and Kenneth 
T Orr in the United States) satisfy all of the 
above requirements with an added bonus; 
they produce structured programs that 
nearly always run correctly at the first 
effective trial. They allow people to produce 
superprograms without being superprogram- 
mers. 

The purpose of this article is to show how 
to develop and code a structured program 
using the Warnier-Orr methodology from 
start to finish. The technique is a straight- 
forward approach to producing correct pro- 
grams. It is just as valid and successful for 
personal microcomputer applications as it is 
for megacomputer applications in the world 
of business, science and industry. I feel that 
this method of designing a program is one of 
the most advanced state of the art software 
development techniques in existence today. 
It is a concise, step by step method with 
predictable results. 

Step One: Identify the Output 

This is the first, the primary and the most 
important rule of all for the construction of 
a correct program. It cannot be emphasized 
enough. The failure to first identify the 
outputs of a program is usually the primary 
reason programs fail to run correctly. 

You must ask yourself the questions: 
"How will I be able to tell when I am 
through with this program?" "What will the 



printed, displayed and punched outputs 
physically took like?" "What will the pro- 
gram be able to do?" All of these questions 
must be thoroughly answered before you 
can even begin to think of coding the 
program. Skipping this step because "Aw, I 
know what I want to do," or "Gee, this isn't 
any fun, let's start coding," is a common 
mistake, and although you may get away 
with it on a small program once in a while, 
omitting it will kill you more often than not. 

A good example of the kind of trouble 
you can get into by assuming that you know 
everything about a problem can be found in 
a recent popular film. In the movie Jeremiah 
Johnson, Jeremiah befriends an old hunter 
and trapper in the mountains. The old 
hunter asks Jeremiah if he can skin a bear. 
"Of course I can," he replies. In the next 
scene, we see the old man running down a 
hill towards the cabin closely pursued by a 
very large bear. The hunter runs into the 
open front door, leaps out of the back 
window and yells: "There . . . you skin that 
one and I'll go get you another." Jeremiah 
failed to do one basic thing; he forgot to ask 
whether the bear he was supposed to skin 
was dead. Skinning a dead bear is one thing, 
skinning one that is still running around the 
room trying to skin you is quite another. 
Just as writing a program after it has been 
properly defined is one thing, and trying to 
write one when you aren't even sure what it 
is supposed to do when you are finished is 
another. 

Defining outputs is not really an un- 
reasonable requirement to make; after all, no 
building contractor would begin construc- 
tion without first knowing what the finished 
building was supposed to look like; no 
electrical engineer would start soldering 
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parts together without a schematic diagram. 
In fact, no profession (reliable profession 
anyway) involved in the business of putting 
things together ever starts to build anything 
unless they know what it will look like after 
they are done. Yet, that is precisely the way 
most programmers try to write programs. 
Then they wonder what went wrong when 
they have problems. The same programming 
principles which apply to the professional 
apply just as much to the amateur, for no 
one's time is unlimited. 

After defining all of the outputs of the 
program, the next step is to define the 
logical data base, although you will probably 
never really spend much time at this step 
with most personal microcomputer applica- 
tions. 



Step Two: Define the Logical Data Base 

The reason this step is trivial for many 
personal use applications is because the 
logical data base typically consists of only 
one numeric field. It is typically the field 
holding a person's response to a program 
generated question. For illustrative purposes 
let us look at a home computer application 
that requires a slightly more complex data 
base arrangement Take for instance a com- 
puter program that would balance the family 
checkbook and produce a financial report 
each month. The report designed in step one 
might look something like figure 1. 

If you were keeping manual records that 
you wanted to be able to search very easily, 
you would keep each one of those entries, 
perhaps on index cards, filed by year, by 
month and by date. Figure 2 illustrates a 
way of representing the logical data struc- 
ture for the checkbook balance report in 
WamierOrr notation. 

In figure 2, you can see the logical data 
structure for the checkbook balance report. 
The report is organized by year; within each 
year by months; within each month by days; 
and within each day by transactions, which 
are either debits (checks) or credits (de- 
posits). Note that year, month, day, and 
transactions all appear in the report at least 
once and possibly many times; thus we see 
the notation (1,n) in the diagram. Having an 
entry for a day that had no transactions or 
having a monthly report with no days is 
hardly worth the trouble. However, each 
transaction is either a credit transaction 
(credit occurring once, and debit not occur- 
ring) or a debit transaction (debit occurring 
once and credit not occurring). This con- 
dition is reflected on the chart by the "e" 
symbol, which is the symbol for mutual 
exclusion. 

One important point needs to be made 



MONTHLY FINANCIAL REPORT 
FOR THE MONTH OF JANUARY 1977 









BALANCE 


FORWARD OF 


$231.90 


DATE 
I 


CHECK# 
978 


TO: 

GROCERY STORE 
-MILK, BREAD, EGGS 


DEBIT 
2.23 


CREDIT 


BALANCE 
229.67 


1 


979 


PHONE COMPANY 


37.14 






192.53 


3 


980 


GAS BILL 


25.61 






166.92 


5 


981 


GEORGE FREDRICK 
-SHOVELLING SNOW 


5.00 






156.92 


j) 




PAYCHECK DEPOSIT 




312 


18 


469.10 


6 


982 


ELECTRIC COMPANY 


23.15 






445.95 


31 


1013 


BYTE MAGAZINE 
-SUBSCRIPTION RENEWAL 


12.00 






237.11 



CURRENT BALANCE 



Figure 1: Proposed output of a computer program for balancing a checkbook 
and producing an end of month report 



here. The diagram of figure 2 is not the 
logical data base for this report; it is only the 
report's logical data structure. Making a 
chart of the logical data base requires that 
we map the data elements that appear in the 
report onto the logical report structure, as 
we have done in figure 3. In figure 2 we 
showed conceptual relationships of one part 
of the structure to another. In figure 3 we've 
filled in the required details needed to 
complete each level of the structure. One 
level of the structure corresponds to one 
bracket and the levels are counted left to 
right. 

Step Three: Define the Physical Data Base 

Defining the physical data base of a pro- 
gram is largely a packaging decision; what 
physical arrangement of the data in the 
computer will best suit the needs of the 
program. The only help I can give you on 
this is the simple suggestion that the physical 



CHECK FILE 



YEAR 
(1.V> 



J MONTH 
\ <1,m) 



DAY 
<1.d> 



TRAMS- 
ACTIONS 

n.t) 



DEBIT 
(0,1) 



CREDIT 
(0,1) 



Figure 2. Logical data structure for the checkbook balance report. The 
notation (l } n) indicates an operation will take place at least once and up to n 
times } inclusive. 
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CHECK / YEAR 
FILE A (1,y> 



YEAR NUMBER 



< 



MONTH 



< 



NAME OF MONTH 
BALANCE FORWARD 



DAY 

(1,(1) 



DAY NUMBER 



TRANSACTIONS 



/ TRA 



CURRENT BALANCE 



DEBIT 
(0,1) 



CREDIT 
(0.U 



BALANCE AMOUNT 



CHECK NUMBER 

"TO" DESCRIPTION 

"FOR" DESCRIPTION 
(0,1) 

AMOUNT 

CREDIT DESCRIPTION 
AMOUNT 



Figure 3: The logical data 
base is generated by map- 
ping the data elements 
that appear in the report 
onto the logical data struc- 
ture. 



representation should mirror the iogical 
representation in all but the most extreme 
cases. These are hardware decisions. You 
may wish to construct a file one way if you 
are using a cassette tape storage system; you 
may construct it another way if you have a 
floppy disk. You would not want to impose 
a file structure that forced a cassette tape to 
behave like a disk by running back and forth 
through the tape at high speed. That is a 
good way to burn up a tape drive in a hurry. 
Ultimately, as memories become faster, 
more versatile and more efficient, the phys- 
ical data base will probably always be able to 
mirror the logical data base. Magnetic bubble 
memories, for instance, have no moving 
parts to burn up. 

In the checkbook balance report program 
the simplest physical data base would be a 
sequential file. The necessary information 
and a brief description of each transaction 
could be stored in the order shown in figure 
4, read left to right 

Given that we have a file with this 
information on it which is sorted by year, 
month, day and transaction, producing a 
report program is almost a trivial exercise. 



Step Four: Design the Process Structure 

Since in this case we are working with a 
single program, the process structure will 
ultimately represent the program structure. 
Were we designing an entire system, an 
accounts receivable system for instance, the 
process structure would represent many pro- 
grams and the associated system procedures 
that would operate them. The process struc- 
ture is obtained from the same logical data 
structure that the logical data base was 
derived from. 

Referring again to both figures 1 and 2, 
we can begin to design the program from the 
bottom to the top. Looking first at the left- 
most bracket of figure 2, which for this step 
is labeled REPORT PROGRAM, we could 
draw a structure thus: 



START PROGRAM < OPEN FILES 



REPORT PROGRAM 



END PROGRAM < CLOSE FILES 



i- 



date 



check or 

deposit 

flag 



description 
field 1 



description 
field 2 



transaction 
number , 



Figure 4: A sequential file with a record format such as this is the simplest 
physical data base for the checkbook program. The information that is 
needed has been decided by the iogical data base. The order they are put on 
the file depends on exactly what you intend to do. Since in this case we will 
be sorting by date, the date of the transaction appears first on the file. 
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Note that program structure is denoted by 
left to right positioning, and that sequences 
of operations are noted top (first) to bottom 
(last). 

We can see that the only thing for us to 
do at the beginning of the program is to 
open the files, and the only thing to do at 
the end of the program is to close the files 
we have used. Moving right to the YEAR 
bracket, the process END YEAR must be 
defined. For this program there is nothing to 
do at the end of the year, so we fill in the 
bracket with the notation SKIP: 



YEAR 
M.V* 



END YEAR -< SKIP 



{■ 



For the bracket labeled MONTH, there is 



the matter of printing the CURRENT 
BALANCE at the end of the month: 



MONTH 
(l,ml 



END MONTH 



{• 



PRINT CURRENT BALANCE 



There are no processes to be performed at 
the end of each DAY, therefore we show the 
END DAY process the same way as the END 
YEAR process: 



DAY 

n,d) 



{■ 



Figure 5: Completed Wamier-Orr diagram for a checkbook balancing report program. This program arrangement wif I probably 
result in the smallest amount of memory being used. The sequences of operations at any given level (left-right position) are 
read from top to bottom. A level of operations corresponds to a logical level of procedure calls in a block structured program- 
ming language. 



BEGIN PROGRAM 



REPORT PROGRAM 



( 



OPEN FILES 

SET INITIAL VALUES 

READ FIRST RECORD 



, YEAR 
( <1.v> 



BEGIN YEAR 







BEGIN MONTH 



, MONTH 
( 0,m) 



f PRINT HEADINGS 

\ PRINT STARTING BALANCE 

^ INITIALIZE RUNNING BALANCE 



DAY 



, TRANSACTIONS 
< (1,0 



BEGIN 
TRANSACTIONS 



DEBIT 
(0,1) 



© 



CREDIT 
10,1 > 



END TRANSACTION 



END DAY 







END MONTH 



( 



PRINT CURRENT BALANCE 



END YEAR 



END PROGRAM 







MOVE CHECK NUMBER, CHECK 'TO", AND 
CHECK AMOUNT TO PRINT LINE 

SUBTRACT CHECK AMOUNT FROM RUNNING BALANCE 

MOVE RUNNING BALANCE TO PRINT LINE 

PRINT A LINE 

PRINT SECOND LINE (0.1) 

SPACE ONE LINE 



MOVE DEPOSIT AMOUNT, OEPOSIT DESCRIPTION 
TO PRINT LINE 

ADD DEPOSIT AMOUNT TO RUNNING BALANCE 

MOVE RUNNING BALANCE TO PRINT LINE 

PRINT A LINE 

SPACE ONE LINE 



( 



GET NEXT RECORD 
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The TRANSACTIONS process is where 
most of the work is done. For each CREDIT 
or DEBIT, one line and possibly a second 
(for DEBIT) is printed, showing the appro- 
priate information; the running balance is 
updated, and the next record must be read: 



TRANSACTIONS 
(1,0 



DEBIT 
(0.1 ) 



(0.1) 



END 
TRANS- 
ACTION 



MOVE CHECK NUMBER, 
CHECK "TO". AND 
CHECK AMOUNT 
TO PRINT LINE 

SUBTRACT CHECK AMOUNT 
FROM RUNNING BALANCE 

MOVE RUNNING BALANCE TO 
PRINT LINE 

PRINT A LINE 

PRINT SECOND LINE (0.11 

SPACE ONE LINE 



MOVE DEPOSIT AMOUNT, 

DEPOSIT DESCRIPTION 
TO PRINT LINE 

ADD DEPOSIT AMOUNT TO 
RUNNING BALANCE 

MOVE RUNNING BALANCE TO 
PRINT LINE 

PRINT A LINE 

SPACE ONE LINE 



■Jgetn 



EXT RECORD 



With this much of the program design 
done, the only things to be filled in are the 
BEGIN brackets for each level. The entire 
diagram with these processes added is shown 
in figure 5. 

Looking at the Warnier-Orr diagram for 
the checkbook balance program, you can see 
the entire series of events which must take 
place to correctly process the report as it 
was given. Note also that this is the only 
correct structure that will produce the 
checkbook balance report. Any other struc- 
ture that will produce the report is iso- 
morphic to this structure. The structure is 
also optimal in operation, in the sense that 
nothing is ever done unless it must be done. 

The program which is coded from this 
structure will also have some predictable 
features. It will run as quickly as possible. It 
will usually require the least amount of 
storage. It is very easy to maintain, and it 
will am correctly at the first effective trial. 

M„* U^-4 ^'...'.^l^^Ar f^*- -v U-il-f* Ur\ttr r\f ovtn 
IHUl UaU U1VIUCIIU3 IUI a i itii i nuui wi w\l.««j. 

work. Syntax runs are not effective trials, 
but, with a little diligence and effort, syntax 
errors can also be brought under control. 

Part 2 will show how easy it is to fill in 
the details of structured programs using 
Warnier-Orr diagrams.^ 
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Structured Programming 
with Warnier-Orr Diagrams 



Part 2: Coding the Program 



David A Higgins 



In part 1 we carefully constructed a 
design structure. In order to make the most 
of that structure a few words about pro- 
gramming style are in order. While it is true 
to a certain extent that any method of 
coding the structure will produce a logically 
correct program, matters of syntactical 



errors resulting from shoddy coding tech- 
niques as well as problems with maintenance 
seem to indicate that a great deal of care 
should be exercised in the construction of 
the actual program code. 

For this particular example, I'll use a 
fairly standard version of BASIC that 



r 

BEGiN PROGRAM 



REPORT PROGRAM 



J OPEN F 
< SET INI' 
1 READ F 



ILES 
INITIAL VALUES 
FIRST RECORD 



r» 



o 



r 

BEGIN MONTH 



( n.v) 



I MONTH 



) PRINT HEADINGS 

\ PRINT STARTING BALANCE 

^ INITIALIZE RUNNING BALANCE 



DAY 
(1.d) 







BEGIN 
TRANSACTIONS 



DEBIT 

(0,1) 



, TRANSACTIONS , 



CREDIT 
10,1) 



END TRANSACTION 



END MONTH 



PRINT CURRENT BALANCE 



END YEAR 







END PROGRAM 







MOVE CHECK NUMBER. CHECK "TO", AND 
CHECK AMOUNT TO PRINT LINE 

SUBTRACT CHECK AMOUNT FROM RUNNING BALANCE 

MOVE RUNNING BALANCE TO PRINT LINE 

PRINT A LINE 

PRINT SECOND LINE (0.1) 

SPACE ONE LINE 



MOVE DEPOSIT AMOUNT. DEPOSIT DESCRIPTION 

TO PRINT LINE 

ADD DEPOSIT AMOUNT TO RUNNING BALANCE 
MOVE RUNNING BALANCE TO PRINT LINE 
PRlNTALtNE 
SPACE ONE LINE 



( 



GET NEXT RECORD 



Figure 1: Final Warnier-Orr diagram description of the checkbook balance report program (reproduced from part 1). 
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f MATCH j 



NEGATIVE 




ZERO 



POSITIVE 



COMPLEMENT 
DIFFERENCE 



GET 

IDENTIFICATION 

CODE 
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J RETURN J 
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I 
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USE H MATCH 
TO DETERMINE 
SIGHTING 
FLAG STATUS 



USE VSIGHT 
TO DETERMINE 
«;)nnTiwr; 
FLAG STATUS 



G 



ETURN 



) 



runs on a J 100 Jacquard Systems com- 
puter. The concepts and construction 
rules are just as applicable to Tiny BASIC, 
assembly language, and especially APL. 
Obeying the following five coding con- 
ventions will help you write a program that 
will execute on the first time. 

Coding Convention 1 : Names Should Be 
Indicative of Function 

For versions of BASIC that only allow 
one letter names, this is often a little hard, 
but for most other languages with multiple 
character symbols, it is a must. For instance, 
a field that contains an amount should be 
labeled AMOUNT, an address field should 
probably be called ADRESS, and so 
forth. Cutesy names: SNEEZY, DOPEY, 
GRUMPY, HELL (a perennial favorite label 
for adolescent COBOL programmers) are 
to be strictly avoided. 

Coding Convention 2: Comments Should 
Be Used Freely 

Comment lines in programs written in 
obscure languages, APL for instance, should 
probably outnumber actual lines of code. 
Comment lines are especially useful for 
explaining unclear methods of calculation, 
complex decisions, etc. 

Coding Convention 3 : Every Bracket of a 
Warnier-Orr Diagram Should Represent a 
New Subroutine 

Languages that do not permit subrou- 
tines or languages that limit the levels of 
nesting of subroutines are very tricky to 
use and should be avoided if at all possible. 
Save your spare change for three or four 
weeks and go buy a better version of BASIC; 
there are plenty of good ones on the mar- 
ket. In BASIC, each subroutine should be 
clearly labeled with REMark statements. 

Coding Convention 4: Subroutines Should 
Be as Short as Possible 

If a subroutine contains too many state- 
ments it is difficult to understand and main- 
tain. It also means you are probably doing 
something in tins suoroutinc tuat suouiu 
be put in another subsequent subroutine. 
In most high level languages a practical 
limit of 10 to 20 statements is appropriate. 
This rule is standard structured program- 
ming practice. 



Figure 2: This a a flowchart chosen at random for comparison to a Warnier- 
Orr representation. 



Coding Convention 5: GO TOs Should Be 
Avoided 

!n higher level languages, GO TOs can 
often and should be eliminated entirely. 
However, in versions of BASIC that do not 
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have a DO verb and in assembler, GO TOs 
are often necessary. Utmost care is urged 
whenever a GO TO is used; it should only 
be used as a last resort. In assembly lan- 
guage, use of arbitrary jumps or branches 
should be avoided. 

When coding the program, the order of 
the subroutines is not crucial. The only 
piece of code that must be fixed in any 
certain location is the highest level bracket 
which must be the first executable line, 
or lines, of code. One possible way of 
coding the first section is to omit the first 
bracket and consider the code as the main 
program. For BASIC, subroutine calls are 
left unnumbered until the subroutine is 
actually written. In this case, we use nnn to 
indicate an unknown number. 



370 

380 REM 
390 

400 
410 

420 REM 

430 

440 



GOSUB nnn 

DAYS (1,D) 

LET ENDAY = FALSE 

GOSUB nnn 

IF ENDAY = FALSE THEN GOTO 400 

END MONTH 
GOSUB nnn 
RETURN 



For the subroutine DAY: 



450 REM 



460 REM 

470 



DAILY PROCEDURE 



BEGIN DAY 
GOSUB nnn 



480 REM TRANSACTIONS <1,T) 

490 LET ENDTRN = FALSE 

500 GOSUB nnn * 

510 IF ENDTRN = FALSE THEN GOTO 500 

520 REM END DAY 
530 GOSUB nnn 
54 RETURN 



100 REM CHECKBOOK BALANCE REPORT PROGRAM 



110 REM 
120 



BEGIN PROGRAM 
GOSUB nnn 



130 REM YEAR (1,Y) 

140 LET ENDYR - FALSE 

150 GOSUB nnn 

160 IF ENDYR - FALSE THEN GOTO 150 



170 


REM 


END PROGRAM 


180 




GOSUB nnn 


190 




END 



The TRANSACTIONS process breaks down 
as follows: 



550 REM 



TRANSACTIONS ROUTINE 



560 REM CREDIT (0,1) OR DEBIT (0,1) 

570 IF CDFLAG ~ CREDIT THEN GOSUB nnn ELSE GOSUB nnn 



580 


REM 


END TRANSACTION 


590 




GOSUB nnn 


600 




RETURN 



Another way to program this section would 
be to have the above piece of code as a sub- 
routine to an even higher level procedure as 
follows. 

80 REM CHECKBOOK BALANCE REPORT PROGRAM 
90 GOSUB 110 
95 END 

. 100 through 180 as above 

200 RETURN 

Either way of coding is acceptable. Note 
that the GO TO in statement 160 is used to 
create the structure of a DO UNTIL, a 
feature that is not available with this par- 
ticular BASIC. 

The center path of the Warnier-Orr dia- 
gram is the easiest to begin to code at this 
point. So the code for the YEAR, the 
MONTH, and the DAY routines is shown 
next; for the subroutine YEAR: 

250 REM YEARLY PROCEDURE 



260 REM 
270 



BEGIN YEAR 
GOSUB nnn 



280 REM MONTHS (1,M) 

290 LET ENDMO = FALSE 

300 GOSUB nnn 

310 IF ENDMO = FALSE THEN GOTO 300 



320 REM 

330 

340 



END YEAR 
GOSUB nnn 
RETURN 



For the subroutine MONTH: 

350 REM MONTHLY PROCEDURE 
360 REM BEGIN MONTH 



Subroutine DEBIT is coded a bit dif- 
ferently from the way it was designed for 
one simple reason. BASIC will let you out- 
put from the same fields that were read 
in as input; many languages do not. There- 
fore, the only code remaining in the sub- 
routine is the subtraction of the amount 
from the running balance and the print 
commands. 

610 REM DEBIT PROCEDURE 

620 LET RUNBAL - RUNBAL - AMOUNT 

650 PRINT ON PRINTR: DAY, CHKNUM, DESC1, DRAMT, CRAMT, RUNBAL 

660 IF DESC2 # SPACES THEN PRINT ON PRINTER: DESC2 

670 PRINT ON PRINTR: SPACES 

68 RETURN 

The symbol # is the not equal to operator. 
Note that this code makes no attempt to 
format the output line. Although the facility 
is available with this version of BASIC, it 
differs greatly from other line formatting 
BASICs around, and would serve only to 
confuse the immediate issue. 

The CREDIT process is very similar to 
the DEBIT process. 

690 REM CREDIT PROCEDURE 

700 LET RUNBAL = RUNBAL + AMOUNT 

7 30 PRINT ON PRINTR: DAY, DESC1, CRAMT, DRAMT, RUNBAL 

740 PRINT ON PRINTR: SPACES 

7 50 RETURN 

The only remaining subroutines to be 
coded appear below: 



760 REM END TRANSACTION 

763 LET OLDDAY - DAY 

770 INPUT FROM CHECK 1: DAY, 

CHKNUM, i, AMOUNT 
775 ON ENDFILE GOSUB ^a 



DESCt, DESC2, 



continued on page 24 
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F/^wre 3: The original flowchart in figure 2 converted into Warnier-Orr diagram. This is a much 
simpler looking diagram and is easier to follow and explain to someone. Since it is broken down 
into sections it can be p rogrammed as a series of subroutines that can be easily maintained and 
modi fed. Note that item means "the complement of item. " 
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Listing 1: BASIC source listing for the checkbook balance report program. 
Each of the subroutines can be matched with one of the brackets in the 
diagram of figure 1. The individual modules that do not contain any code 
should be left as they are to facilitate easy maintenance in the future. 



100 

110 
120 
130 
140 
150 
160 
170 
180 
190 
200 
210 
220 
230 
240 
250 
260 
270 
280 
290 
300 
310 
320 
330 
340 
350 
360 
370 
380 
390 
400 
410 
420 
430 
440 
450 
460 
470 
480 
490 
500 
510 
520 
530 
540 
550 
560 
570 
580 
590 
600 
610 
620 
630 
640 
650 
660 
665 
670 
680 
690 
700 
710 
720 
730 
740 
750 
760 
770 
775 
780 
790 
800 
810 
820 
8 30 
840 
850 
860 
870 
880 
890 
900 
910 
920 
930 
940 
950 
960 



REM CHECKBOOK BALANCE REPORT PROGRAM 

REM BEGIN PROGRAM 
GOSUB 1090 

REM YEAR (1,Y) 

LET ENDYR = FALSE 

GOSUB 280 

IF ENDYR - FALSE THEN GOTO 170 

REM END PROGRAM 
GOSUB 1290 
END 

REM **************************************** 
REM YEARLY PROCEDURE 

REM BEGIN YEAR 
GOSUB 1470 

REM MONTH (1,M) 

LET ENDMO = FALSE 

GOSUB 4 30 

IF ENDMO = FALSE THEN GOTO 320 

REM END YEAR 

GOSUB 1390 

RETURN 

REM **************************************** 
REM MONTHLY PROCEDURE 

REM BEGIN MONTH 
GOSUB 1210 

REM DAYS (1,D) 

LET ENDAY = FALSE 

GOSUB 580 

IF ENDAY = FALSE THEN GOTO 470 

REM END MONTH 
GOSUB 1340 

RETURN 

REM **************************************** 
REM DAILY PROCEDURE 

REM BEGIN DAY 
GOSUB 1500 

REM TRANSACTIONS (1,T) 
LET ENDTR - FALSE 
GOSUB 720 
IF ENDTR = FALSE THEN GOTO 620 



REM 



END DAY 
GOSUB 1430 
RETURN 



REM **************************************** 
REM TRANSACTIONS PROCEDURE 

REM CREDIT {0, 1) OR DEBIT {0, 1) 

IF CDFLAG = DEBIT THEN GOSUB 800 ELSE GOSUB 890 

REM END TRANSACTION 
GOSUB 965 
RETURN 

REM **************************************** 
REM DEBIT PROCEDURE 

LET RUNBAL = RUNBAL - AMOUNT 

PRINT :DAY;CHKNUM;DESCl; ' ' ;AMOUNT; RUNBAL 

IF DESC2 # ' • THEN PRINT :SPACES; DESC2 

PRINT : SPACES 

RETURN 

REM **************************************** 
REM CREDIT PROCEDURE 

LET RUNBAL = RUNBAL + AMOUNT 

PRINT :DAY' ' ;DESC1; AMOUNT; * * ;RUNBAL 

PRINT : SPACES 

RETURN 

REM **************************************** 
REM END TRANSACTION 



965 

970 

980 

985 

990 

1000 

1010 

1020 

1025 

1030 

1040 

1050 

1060 

1070 

1080 

1090 

1100 

1110 

1120 

1130 

1140 

1150 

1160 
1170 
1180 
1190 
1200 
1210 
1220 
1230 
1235 
1240 
1250 
1260 
1265 
1270 
1280 
1290 
1300 
1310 
1315 
1320 
1330 
1340 
1350 
1360 
1365 
1370 
1380 
1390 
1400 
1405 
1410 
1420 
1430 
1440 
1445 
1450 
1460 
1470 
1480 
1485 
1490 
1495 
1500 



LET OLDDAY = DAY 

INPUT FROM CHECKS: DAY, CHKNUM, CDFLAG, DESCl,DESC2, AMOUNT 

ON ENDFILE CHECKS GOSUB 1030 

IF OLDDAY # DAY THEN LET ENDTR * TRUE 

RETURN 

REM *************************************** 
REM END OF FILE 

LET ENDAY, ENDMO, ENDTR, ENDYR = TRUE 
RETURN 

REM *************************************** 
REM BEGIN PROGRAM PROCEDURE 

OPEN CHECKS 1 , SYMBOLIC, INPUT: CHECKS 

STRING SPACES, CDFLAG, DESC1, DESC2, MONTH 

DECIMAL AMOUNT, BALANC, RUNBAL 

LET TRUE = 1 

LET FALSE = 1 

LET SPACES = ' 

INPUT FROM CHECKS: DAY, CHKNUM, CDFLAG, DESCl, DESC2, 

AMOUNT, BALANC, & MONTH, YEAR 

RETURN 

REM *************************************** 
REM BEGIN MONTH 

PRINT :' CHECKBALANCE REPORT* 

PRINT :» FOR THE MONTH OF ' :MONTH;YEAR 
PRINT : SPACES,' BALANCE FORWARD OF '; BALANC 
LET RUNBAL = BALANC 
PRINT :'DAY CHECKS FOR 
RETURN 



DEBIT CREDIT BALANCE 1 



REM *************************************** 
REM END PROGRAM 

CLOSE CHECKS 
RETURN 

REM *************************************** 
REM END MONTH 

PRINT :' CURRENT BALANCE * , RUNBAL 
RETURN 

REM *************************************** 
REM END YEAR 



REM *************************************** 
REM END DAY 



RETURN 



REM *************************************** 
REM BEGIN YEAR 



RETURN 



REM *************************************** 
REM BEGIN DAY 



RETURN 
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continued from page 21 



778 
780 

790 REM 

800 

810 

820 REM 

830 

840 

850 

860 

870 

880 

890 REM 

900 

920 



IF OLDDAY 

RETURN 



# DAY THEN LET ENDAT - TRUE 



END OF CHECK FILE DEFAULT SUBROUTINE 
LET ENDAY, ENDTR, EN DMO, ENDYR = TRUE 
RETURN 

BEGIN MONTH PROCEDURE 
PRINT ON PRINTR: HDR1S 
LET RUNBAL = BALANC 
PRINT ON PRINTR: RUNBAL 
PRINT ON PRINTR: HDR2$ 
PRINT ON PRINTR: SPACES 
RETURN 

END MONTH PROCEDURE 
PRINT ON PRINTR: RUNBAL 
RETURN 



The program is finished with the BEGIN 
PROGRAM and the END PROGRAM sub- 
routines, which arc not developed here, 
and the replacing of the untagged GOSUBs 
coded before. The modules for which a 
GOSUB was generated should probably 
remain a part of the program even though 
they contain no code. They make main- 
tenance much easier. The entire working 
program with formatting and other embel- 
lishments appears in listing 1 . 

Conclusion 

The art of programming has become 
a process which can be taught to anyone 
who needs to use it, which is something 
that we have not been able to accomplish 
until very recently. Admittedly, the tech- 
nique for developing programs presented 
here is sometimes tedious and not very 
creative, but it will get the job done. In the 
personal computer field a lot of enthusiasts 
probably enjoy programming on the fly 
and spending all night debugging. But 
for those who don't, including myself, 
and who aren't satisfied with just running 
someone else's canned programs, there is 
an alternative. As the pioneer in this 
methodology, Jean-Dominique Warnier, puts 
it: "If you don't have time to do it right, 
do you have time to do it over?" Real- 
istically, one cannot say that this method- 
ology is the ultimate in software process 



design or that it is completely right. It is 
not. Something is sure to come along in the 
future that is better. But, for now, it is 
certainly a large step in the right direction. ■ 



Once I finished reading about the ease 
with which Warnier-Orr diagrams could 
he used I decided to take a sample flowchart 
and convert it into the Warnier-Orr form to 
see how much of a difference there actually 
was. I happened to be working on an article 
by Geoffrey Gass (entitled "Starfleet") 
which contained a large number of flow- 
charts. Choosing one at random I converted 
it Figure 2 is the original flowchart Fig- 
ure 3 is the converted diagram. I think 
the Warnier-Orr form is much easier to read 
and understand. 

When designing with flowcharts it is 
sometimes difficult not to cross lines or 
have a great deal of redundancy in the pro- 
gram which makes it difficult to follow. 
All the arrows going across the paper are 
very distracting and hard to follow. The 
Warnier-Orr diagram does not have this 
disturbing problem. It is very easy to fol- 
low the program through the various 
subroutines. 

The Warnier-Orr diagram lends itself 
to structured program writing. If you con- 
sider each of the separate brackets another 
subroutine it is very easy to write the pro- 
gram just as it stands from top to bottom. 
When we use conventional flowchart tech- 
niques we end up leaping about the program 
to perform statements that are at various 
parts of the same routine. In my opinion 
the Warnier-Orr diagram is a quantum leap 
in the direction of aid for structured pro- 
gram designers. 

Ray Cote 

Editor 

BYTE Publications 
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Warnier-Orr Diagrams: 

Some Further Thoughts 



GT Wedemeyer 

The article "Structured Program Design" 
in the October 1977 BYTE, page 146 1 , has 
certainly simplified my thinking. However, 
the use of the symbol © seems to violate 
a rule implicit in the Warnier-Orr diagram 
that one need not and in fact must not go 
up in a list contained within a bracket of a 
given order. The symbol requires 
checking up and down the list of case 
statements. I believe that what is meant 
is illustrated in figure 1. In this example 
CASE J is equivalent to ROLL = "J." 
This manner of diagramming clarifies the 
relationship between statements having 
alternatives and statements not having 
alternatives. It also eliminates the need for 
the instruction SKIP, since the finding of 
no more items in a list of a given order is 
the equivalent of an instruction to return 
to the proper place in the list of the next 
lower order, where the order of a list is 
its position from left to right as shown 
in figure 2. 

I would like to define the instruction 
RETURN to mean "in the list of next lower 
order than the list in which this instruction 
is found, complete the step immediately 
following the lowest completed step." 
Although this instruction seems implicit, 
as I indicated above, I would prefer that 
it be explicitly stated, and I think it would 
make the diagrams more easily followed. 

Dave Higgins replies: 

It appears from your letter that you are 
very interested in using the Warnier-Orr 
diagramming techniques. I think you will be 
pleased with the results. 

I'd like to comment on the suggestions 
you made for improving the diagrams. 
Unlike flowcharts, which have become quite 
rigid and inflexible in form, the Warnier- 
Orr diagrams are still in a relative infancy, 
and do still change occasionally. We here at 
Langston, Kitch have made some minor 
modifications to the diagrams in the last 
year in order to add some capabilities that 
were previously vague or nonexistent. 
We are continually evaluating the diagrams, 
looking for shortcomings or ambiguities, 
and therefore welcome suggestions along 
these lines. It is in this light that I considered 
your suggestions for revising some of the 
notation. 



Figure 7. 



BEGIN TURN 
(= ROLL DIE) 

DETERMINE "J" 
CHOOSE CASE "J" 



I PICK RANDOM "J" BETWEEN 
I ONE AND SIX 



| CASE 1 
{ CASE 2 
| CASE 3 



| CASE I 



(RETURN) 



Figure 2. 



ORDER 1 



ORDER 2 



ORDER 3 ORDER n 



First of all, with respect to your ideas 
concerning the representational form of a 
CASE statement: I think your objection 
to the use of the © symbol stems from the 
fact that there are two primary ways to 
actually code a CASE structure. One way is 
with the use of a "computed GOTO or 
GOSUB." The diagram you show is ideally 
suited for translation into a computed 
GOTO, which would look something like 
listing 1. But I don't think this is a worth- 
while change to make to the basic form 
of the diagrams themselves. The reason is 
this: although your method works fine for 
CASE statements that lend themselves to 
computed GOTO's, there are a whole host 
of other CASE statements where the use of 
a computed GOTO is an extreme inconven- 
ience. Take, for example, the CASE of 
figure 3. It would be inconvenient to have to 
rig up a computed GOTO to execute this 
CASE. It is much simpler to code it using 
a "nested IF" statement, which is the other 
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Figure 3. 



DAY - MONDAY 
(0,1) 



/VIAA 



© 



/WW' 



AAAA, 



DAY = TUESDAY 
(0,1) 

© 

DAY -WEDNESDAY i a/VYA/ 
(0,1) 1 



Listing 1. 300 rem case statement 

310 REM DETERMINE CASE "J" 

320 LET J=iNT(RND<0)*6+1) 

330 ON J GOTO 340,380,420,460,500,540 

340 REM CASE 1 

350 

case 1 process 

370 GOTO 570* 
380 REM CASE 2 
390 

case 2 process 

410 GOTO 570* 

cases 3-6 as above 



570 REM END CASE 



Listing 2. 

300 REM CASE STATEMENT 

310 IF D$-"MONDAY" THEN 330 ELSE IF D$="TUESDAY" THEN 360 

ELSE IF D$="WEDNESDAY" THEN 400 
320 GOTO 440 
330 REM CASE 1: DAY -MONDAY 

monday process 

350 GOTO 440 

360 REM CASE 2: DAY = TUESDAY 

tuesday process 
390 REM CASE 3: DAY * WEDNESDAY 

Wednesday process 
440 REM END CASE 



Listing 3. 



300 REM CASE STATEMENT 
310 IF D$="MONDAY" THEN 350 
320 IF D$-"TUESDAY" THEN 400 
330 IF D$-"WEDNESDAY" THEN 450 
340 GOTO 500 
350 REM CASE 1 : DAY = MONDAY 

monday process 

390 GOTO 500* 

400 REM CASE 2: DAY = TUESDAY 

tuesday process 

440 GOTO 500* 

450 REM CASE 3: DAY - WEDNESDAY 

Wednesday process 

500 REM ENDCASE 



popular way to code CASE statements. In 
pseudocode, this CASE is: 

IF DAY = MONDAY 

THEN MONDAY-ROUTINE 
ELSE IF DAY = TUESDAY 

THEN TUESDAY-ROUTINE 
ELSE IF DAY = WEDNESDAY 

THEN WEDNESDAY-ROUTINE 

You can see the natural one-to-one corre- 
spondence between the Warnier-Orr diagram 
and the pseudo-code. This is easily trans- 
lated to code in listing 2. Listing 3 shows an 
alternative for those BASICs without the 
nested IF capability. This is the preferred 
method for coding a case statement because 
this method will work for aii CASE state- 
ments, regardless of whether or not the 
CASE is suited for a computed GOTO. 
Also, with the computed GOTO, you must 
be sure that your "J" is restricted to the 
proper range. This is not to say that you 
can never use the computed GOTO; just 
be sure that its use is justified and then 
be very careful. Personally, I feel it is more 
trouble than it is worth. 

As for the elimination of the brackets 
with "SKIP" in them: I don't believe that 
you really want to do this. For instance, in 
the BUG game published in the October 
1977 BYTE 2 , no action is taken when a 
player rolls a "BODY" on the dice but 
already has a body. This bracket is filled 
with the notation "SKIP," which indicates 
that, although the bracket is an essential 
part of the logic of the diagram, nothing 
is to be done there. However, in future 
versions of the game, you might just decide 
to tell the player that "YOU ALREADY 
HAVE A BODY" when that condition 
occurs. If the original diagram is left with 
the empty brackets intact, you have a 
fixed and ready place to put that PRINT 
command. The design is very easy to change 
and the documentation for the new program 
is only a matter of erasing one line and 
replacing it with another. 

Also, I don't believe that we need to add 
the (RETURN) command at the end of the 
brackets as you suggest. As you state, the 

m+ t ,rr\ fr/-i tl«»£i navt- htrrhoct IpwpI in thp 
I «wt-Ul i i fcW u iv itwvt. < »<o" — ' *- •" * w ~ 

diagram is already implied at the end of each 
bracket: therefore adding (RETURN) on 
each bracket would amount to a lot of 
"busywork," which would clutter up 
the diagrams with a lot of unnecessary 
information. 

Again, I'd like to thank you for your 
suggestions and extend an invitation for all 
the readers of BYTE to submit their sug- 
gestions for improvement of the Warnier- 
Orr diagrams to either Langston, Kitch and 
Associates or to me for examination. ■ 
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An Outline Method For 
Program Design 



Jerry Goff 



Listing 1: FORTRAN version of BUG program using logical diagramming 
comments. The entire logic of the program is inserted at the start for docu- 
mentation purposes. This way you are never further from the documentation 
than a program listing. 



Since I am dedicated to being human, I 
always try to maximize the returns of an 
effort while minimizing the effort (a more 
technical way of saying "Getting the mostest 
for the leastest")- Therefore, when I read the 
article by David A Higgins on the Warnier- 
Orr diagram techniques 1 , I thought "AHA!" 
(or something like that). This is it! The 
method I now use requires thought, logic 
and care to get good results. Perhaps the 
Warnier-Orr method is easier. 

I carefully, logically, and thoughtfully 
constructed a Warnier-Orr diagram of a 
program. It worked. I then carelessly, 
illogically, and unthinkingly constructed 
a Warnier-Orr diagram of a program. It not 
only bombed, it hung the computer up. 

The conclusion is, therefore, obvious. If 
I already have a method that works every 
time I use it and I'm familiar with it, why 
change? Well, so much for the obvious. If 
it's better, change. 

I studied the Warnier-Orr diagram that 
Mr Higgins included in his article to deter- 
mine if it was better than my method or if 
it had something more to offer (I carefully 
laid aside most of my prejudices), when 
low and behold, the two methods are the 
same; only the form is different. Let me 
sneak in an advantage of my method. It 
can be stuck in the program as a remark. 

I did just that in my version of BUG. 
You can see that my version (listing 1) of 
the Warnier-Orr method is in the form of a 
simple block outline similar to the type 
forgotten from school. It simply outlines 
in logical sequence what you want done. 
Whenever a question needs to be answered, 
a substatement is generated until all the 
questions are answered. If nothing happens, 
simply continue on (just like life). 

Try either the Warnier-Orr method or 
this method. They both work and all you 
have to lose are ulcers, sleepless nights. . . 



0001 


FTN4,L 






0002 


PROGRAM BUG 


0003 


C 






0004 


C < 


'*********#»«**44*******4*«*ft********ft*ft*****ft****4*t**ft****** 


W5 


C i 


> A3 


DIMENSION ARRAYS * 


0006 


t i 


* H) 


INITIALIZE PARAMETERS * 


00017 


C t 


t C) 


SET COUNTER FOR COMPUTERS TURN * 


0008 


C i 


> D) 


ROLL HIE * 


0«09 


C i 




DIEM? * 


0010 


C i 




YES GIVE A BODY * 


00H 


C i 




DI£a2? * 


0012 


C i 




YES * 


0013 


C « 




HAVE A BODY? * 


0014 


C i 




YES GlVf A NEC* * 


0013 


C i 




PIE«3? * 


8016 


C i 




YES * 


0017 


C t 




HAVE A NECK? * 


0018 


C i 




YES GIVE A HEAD * 


0019 


C i 




DIE»4? * 


0020) 


C i 




YES * 


0021 


C 




HAVE A HEAD? * 


0022 


C 




YES * 


0023 


C i 




FEWER THAN 2 ANTENNAE? * 


0024 


C i 




YES GIVE 1 ANTENNA * 


0025 


C i 




DIE«5? * 


0026 


C t 




YES * 


0027 


C i 




HAVE A BOOY? * 


0028 


C 1 




YES GIVE A TAIL * 


0029 


C 1 




DIE«6? * 


0030 


C t 




YES * 


0031 


C i 




HAVE A BODY? * 


0032 


C i 




YES * 


0033 


C i 




FEWER THAN 6 LEGS? * 


0034 


C i 




YES GIVE 1 LEG * 


0035 


C i 




ARE THERE 6 LEGS FOR THIS PLAYER* * 


0036 


c < 




YES * 


0037 


c < 




ARE THERE 2 ANTENNAE? * 


0038 


C t 




YES * 


0039 


C i 




IS THERE I TAIL? * 


0040) 


c * 




YES SAVE THIS PLAYER AS A WINNER ft 


0041 


C i 




HAVE BOTH PLAYERS HAD THEIR TURN? ft 


0042 


C i 




NO SET COUNTER FOR PLAYERS TURN | CONTINUE AT ft 


0043 


C i 




IS THERE A WINNER? ft 


0044 


C i 




NO CONTINUE AT C ft 


0045 


C I 




YES PRINT THE SCORES ft 


0046 


C 1 




IS THE COMPUTER THE WINNER? ft 


0047 


C i 




YES PRINT THE COMPUTER WINS ft 


0048 


C i 




NO PRINT THE PLAYER WINS ft 


0049 


C < 




ARE THEY 80TH WINNERS? ft 


0050 


C 1 




YES PRINT IT'S A DRAW ft 


0051 


C i 


I e) 


PLAY AGAIN? ft 


0052 


C i 




YES CONTINUE AT 8 ft 


0053 


C i 




NO END PROGRAM ft 


0054 


C i 






0055 


C ( 




***** 


0056 


c 






0057 


c 


ft HP 


21MX COMPUTER JERRY E. GOFF ft 


0058 


c 






0059 


c 


ft PUG f 2*NECK 3«H£A0 4«ANTENNA 5«TAIL 6»LEGS « 


H060 


c 






0061 


c 


ft BUG ( 2»PUAYE* ft 


0062 


c 






0063 


c 


ft WIN<11»C0mPUT£R WlN(2)ftPLAYER « 


0064 


c 






0065 


c 


fe*********4**«*******************************«*******4******4****4 


0066 


c 






0067 


1 


3IMENSI0N BUGf6,2),WIN<2),ITIMEt5),IYEARU) 


0068 


( 


>ATA 


L«M f lNTEG,REAL / 181,66,325, 325.0/ 
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Listing 1 } continued: 



00*9 

0071 
0«72 
0073 
W74 
0075 
0076 
cn«77 
0078 
0079 
00B0 
00*1 
0082 
00P3 
0084 
0085 
0086 
0087 
0088 
0089 
0090 
0091 

0093 
0094 
0095 
0098 
0097 
0098 
0(399 
0100 
0101 
0102 
01tf3 
0104 
0105 
0106 
0107 
0108 
0109 
0110 
^ill 
0112 
0113 
0114 
0115 
0116 
0117 
0J18 
0119 
0120 
0121 
0122 
0123 
0124 
0125 
0120 
0127 
0128 
0129 
0130 
0131 
0132 
0133 
0134 
013D 
0136 
0137 
0138 
0139 
0140 
0141 
0142 
0143 

0149 
0146 
0147 
0140 
0M9 
0150 
0131 
0152 
0153 
0154 
0155 
0156 
0157 
0158 
0159 
0160 
0161 
0162 
0163 



C 

c 
c 
c 
c 

10 

20 

30 

40 

50 

60 
C 

c 
c 
c 
c 

100 

c 
c 
c 
c 
c 

75 

80 

05 

C 
C 

c 
c 
c 

c 
c 
c 
c 
c 



110 

120 

130 

C 
C 

e 
c 
e 

M0 



DO 3 I it 1,2 

DO 2 Jsi,6 
RUG(J,I)b0 

CONTINUE 
CONTINUE 

* CALL THE TINE FROM THE COMPUTER & 

ICOOEsll 

CALL EXEC(ICOPE,ITIME,lYEAR) 

JUFLOATCITIME(l)) 
XsX/100,0 

a START THE GAME, COMPUTER 1ST o 

no 100 KM, 2 

&#6fe#&*&ftftDngr6ftft&6 

* ROLL THE DIE ft 

IKsINT(KftREAL) 
IRANDsMODtM*IX*L»INTEG3 
Xs (FLOAT (IRAND)*« tt 5} /REAL 
N«INT(1*.0#X) 



ft GO TO THE ADDRESS CALLED BY THE DIE a 

&ftft&ftftft$d&ftftftft«'66ft«ftdftftft&ft&ft&ft&ft&ft&3&ft$$&ft&0 

GO TO (10, 20, 30, 40, 50,60), N 

BUG(l,K)tj 

GOTO 70 

IP (BUGtl»K) # EO.t) BUG(2,K)si 
GOTO 70 

IF (8UG(2,K) EQ t> l) 8UG(3,K)®1 
GOTO 70 

IF (BUG(3,K).fcQ.i,AM0.BUG(4,K),lT.2) BUG (4,« ) ®8UG (4,K>*1 
GOTO 70 

IF tRUGU,K).EO B i) BUG(5,K)01 
GOTO 70 

IF(Bl»G(l,K) t E0,l t AND,BUG(6,K),LT.B) BUG (6,K) ®BUG{6, K)*| 

* CHECK IF THERE IS A WINNER a 

&6ti&ft6ftfttt6ft0ftftftaftft&6ft«fe&&6&ftftOlS l ftg6$£££g 

IF(BUG{4 f K) t E0,2.4ND,BU6(5 f K) i EO t i t ANO.BU6{e # K).EO.6) ^IN(K)s| 

CONT INUE 

* JUST SOME FORMAT STATEMENTS * 

6&&&ftft6A&&&ft&ft&6a6&6&ftftft&&&ft£ft&ft0ft&6$ft 

FORMAT (/."COMPUTER HAS *") 
FORMAT (» PLAYER HAS «•«) 

FORMAT (I2,2S£, "BODY, »,I2,2X, "HEAD, «, 12, 2B, "HECK e % 
6l2#2X," ANTENNA, 'M2,2X,«TAIL,",I2, 2*, "LEGS") 

6*Oft*ft«fftft*©fr^ftd«rSftft'&6#fi&#66t!f&fe#iSrO6*©*60*fi0**ft0000^&©O 

* CHECK FOR A WINNER AFTER BOTH HAVE PLAYED • 
&i*ft&ft& 6« «&fta0aafcfifrftG8a6$ftA&«i&ft6ftaft&fta&«rft©ftftft&& &&$&$&& 

X^ (wIWti).eo,0.4NO.wIN(8).CQ e 0) GOTO 

* IF THERE IS A WINNER, WRITE THE SCORES AND WHO HON a 

6 6 6 ft a 6 3 A *********************#*********•#*****#****»#****»# 

(mm ITt C IS* 755 

WRITE (10,85) BUGCl»n»BUG(2,n,aUGO,l) f BUGC4in,BUG(S,l)»BUGC6«l) 
WRITE(10,80) 

WRITF (10,85) BUG(i,2),BUG(2,2),8UG(3,2),BUG<4 P 2),BU6(S,g),BU6(6,g) 

IP (WINCU.ca.i) WRITEU0,110) 
FORMAT C" DUE TO INCREDIBLE SKILL, I WIN") 

IF (WlN(2) EQ e l) WRITEU0,120) 
FORMAT (" WITH ALL YOUR LUCK, YOU MANAGED TO WIN") 

IF CWlNCD.EQ.l.AND.WINUKeQ.l) WRXTEOO, 130) 
FORMAT (« BUT IT'S A DRAW ANYHOW",/) 
WRITE (10,148) 

0O*ftft i ft*r&o«t6d*ft«fOO6*<Sr«06^i&fiftft«t0@ftoei8fS'&0ftft®ofe©0dO66©(&&®000^^0*©& 
a PROGRAM EXIT (THIS IS HANDY m STOPPING THE PROGRAH) a 

FORMATC* WANT TO PLAY AGAIN? 1®YE3 P geN0 &») 
REAO(10,&) ANS 

IF (ANS EQ l) GOTO 1 
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Common Mistakes 

Using Warnier-Orr Diagrams 



David A Higgins 



In my opinion, one of the best program 
and system design methods is the Warnier- 
Orr structured systems design approach, 
which I described previously ("Structured 
Program Design/* page 146/ October 1977 
BYTE; "Structured Programming with 
Warnier-Orr Diagrams/* page 104, 2 Decem- 
ber 1977 and page 122, 3 January 1978 
BYTE). This article is being presented be- 
cause of the interest expressed in this sub- 
ject, and because a lot of people will be 
trying these techniques for the first time. 
Newcomers to this methodology often have 
many questions about their work, and want 
to know whether or not what they are doing 
is correct The purpose of this article is to 
outline a few of the more common mistakes 
that beginners make when using this tech- 
nique. 

Philosophical Errors 

Many first time users of the Warnier-Orr 
diagrams tend to make mistakes which are so 
similar that they are worth examining. The 
biggest and most common mistakes tend to 
be a direct result of what we can call philo- 
sophical errors; not really a misuse of the 
techniques so much as a misunderstanding of 
the techniques. The most common error 
stems from the fact that many computer 
programmers tend to be obsessed with the 
desire to write some kind of code at the very 
beginning of the design process. This prob- 
lem usually manifests itself in any or all of 
the following three ways: 

• Trying to code the program while 
designing it (called the design-a-little, 
code-a-!ittle approach). 

® Relying too heavily on language 
restrictions and considerations white 
doing logical design. 

• Skipping the design phase altogether 
because: 

a) the program is "too easy" or 

b) the programmer is "too smart." 

Any of the above practices will destroy 

\page 9 of this edition 
page 14 of this edition 
page 19 of this edition 



Editorial Note. . . 

Since publishing David Higgins* first two articles on Warnier-Orr diagram- 
ming techniques, we have received a number of letters from people expressing 
the message (paraphrased) "if I have this or that self-documenting structured 
programming language, why should I use Warnier-Orr techniques? A fter all, if 
a program in my language is logically equivalent to the Warnier-Orr structure, 
and it is directly executable, I see no need for an extra layer of documen- 
tation. " 

A very real answer to this objection is that it is correct. There is no point 
to using Warnier-Orr techniques if you properly use a language such as 
PASCAL which, having structured programming constructs built in, allows 
long descriptive names for variables and procedures, and as a result can sup- 
port self-documenting code. 

But most currently used languages in personal computing do not easily 
support self-documenting code and modem concepts of structured program- 
ming. The usefulness of the Warnier-Orr methodology is that it provides a 
disciplined way of imposing such structure on a language such as BASIC, 
FORTRAN or assembly language, In effect, the Warnier-Orr discipline is a 
programming language which is intended for hand translation into one of the 
existing unstructured languages. . . Carl Helmers, Editorial Director 

BYTE Publications 



most if not all of the effectiveness of the 
Warnier-Orr methodology for any other 
structured programming methodology for 
that matter. . . CH/. It will certainly cause 
you to waste a great deal of time. 

If you try to use the first technique, the 
design-a-little, code-a-little approach, you 
will probably be in for quite a bit of erasing 
or retyping when you have to change the 
design because you coded yourself into a 
corner that you can't design your way out 
of. Your program will tend to be twice as 
long as it should have been and half as 
efficient. You will probably be in for a lot of 
debugging runs while trying to put back into 
the code everything that you left out when 
you changed the design. As you can see, this 
technique just naturally generates problems. 

The second technique described above is 
a common mistake that veteran program- 
mers almost always seem to make: relying 
too heavily on the program language they 
will be using while doing the program design. 
Consider the two examples of program 
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(a) 



H1 >40.0 
(0,1) 



1000< 



© 



H1 >40J 
(0,1) 



V1 = 
(H1-40.0) 
*{S1*1.5) 



S /y 1-0.0 



(b) 



COMPUTE OVERTIME PAY •< 



HOURS WORKED > 40 
(0,1) 







HOURS WORKED > 40 
(0,1) 



{SET OVERTIME PAY = 
(HOURS WORKED OVER 40) TIMES 
(SALARY AT TIME AND ONE-HALF) 



<■ 



SET OVERTIME PAY = ZERO 



Figure 1: DOs and DON 'Ts of Warnier-Orr diagramming. Figure la looks like actual program 
code and should not be used when trying to logically design a program. Figure lb shows the 
correct method. The entire diagram contains only logical statements which could be coded 
into any computer language. 



designs shown in figure 1 . Both figures 1a and 
lb are diagrams of the same process: compu- 
tation of overtime wages. The diagram in 
figure la however seems to be the type that 
veteran programmers wili almost always try 
to draw. Note its heavy stress on the lan- 
guage aspect of the function. It almost looks 
like part of a BASIC program cut out and 
pasted on a diagram. Contrast that diagram 
with the one of figure 1b which correctly 
details the logical process being performed. 
You can see that if figure la was the only 
documentation for this particular procedure, 
you would probably not be able to tell what 
that piece of code was supposed to be doing. 
You might have some idea because this pro- 
gram seems to have semimeaningful field 
names from which you might deduce some 
purpose. All we can teli for sure from 
figure la is that some part of the program is 
going to crunch a couple of numbers. What 
numbers it is going to crunch and just what 
for are anyone's guess. On the other hand, it 
is impossible to misunderstand what the 
process diagrammed in figure 1b is doing. It 
is very easy to read and comprehend because 
it shows the logical side of the procedure. 

This stress of the logical over the physical 
while designing with the Warnier-Orr dia- 
grams is essentia! to their correct usage. 
Designing as in figure la serves absolutely no 
purpose as far as understanding the process 
that is being described and is essentially 
worthless as far as documentation is con- 
cerned. Even though you might be able to 
tell what that diagram does the day you 
draw it, you probably won't be able to 
understand it in six months. Someone else 
who wants to use your documentation might 
never understand it. 

As long as we're on the subject of docu- 




Time 



Figure 2: Typical productivity curve of 
programmer being introduced to Warnier- 
Orr diagram methodology. 



mentation, I might mention that through the 
development period of this technique, many 
people were concerned that the diagrams 
might become too far removed from the 
actual code, which would render them use- 
less as effective documentation. They wor- 
ried that since the diagrams depicted the 
logical side of the problem, they had little or 
no relevance to the physical (real world) 
side. Those fears were easily put aside with 
two diagramming and coding conventions, 
as follows: 

• Physical mileposts on the Warnier-Orr 
diagrams. 

• Logical symbol tables in the programs. 

Thus, when we actually wrote code that 
looked like that of figure 1a, we would tie it 
to the logical figure lb by adding the follow- 
ing to the diagram. 

:STMT#1000 



COMPUTE OVERTIME PAY •< 
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This wouid be included in the program itself 
by using comment statements: 



1000 REM COMPUTE OVERTIME PAY 

1001 REM HFLD - HOURSWORKED 

1002 REM OVTFLD -OVERTIME PAY 

1003 REM SALFLD = SALARY 



This allows us to have a very clear and 
concise, one to one mapping between the 
logical diagram and the physical code. Refer- 
ences between the two diagrams are quite 
easy. If, for instance, you want to know 
what a particular section of code is supposed 
to be doing, you need only to look it up on 
the logical diagram. Similarly, if you want to 
find out which part of the program is 
carrying out a particular logical function, 
you have the location information at your 
fingertips. This is excellent documentation 
in the event that you or someone else might 
someday want to make a modification to 
your code. 

The third common philosophical error, 
that of skipping the design phase altogether, 
is a real problem to most newcomers. In 
fact, if you look at a typical productivity 
curve for a programmer who is introduced to 
the Warnier-Orr diagrams, it generally looks 
something like the curve in figure 2. 

A currently productive programmer pro- 
ducing work at a constant rate up until the 
time the Warnier-Orr techniques are intro- 
duced (point A), will typically show an ini- 
tial burst of very high productivity (point B). 
This is usually followed by a slump (point C) 
where the programmer sinks back to or just 
above his previous level of work. Eventually, 
he will climb back up to a new, higher level 
of work (point D), where he will usually stay. 
This peculiar slump at point C seems to be 
primarily due to the fact that since the pro- 
grammer has begun to feel comfortable with 
the new technique and has had some initial 
success with it, he begins to feel confident 
enough to try to do the work without doing 
the diagrams first. He soon realizes that the 
quality of his work has dropped off and 
starts to do the diagrams once again, this 
time for good, and his work level rises up to 
a new, higher level that will remain fairly 
constant. 

Apparently, the only way to get new 
people to avoid this temptation is to fore- 
warn them that it does tend to happen, so 
that if and when they find themselves on the 
downhill side of the productivity curve, they 
can recognize the trap in time to escape the 
worst of it. 



DETERMINE UNIT 

PRICE •< 



PRODUCT CODE = A 
(0,1) 



UNIT PRICE IS IN 
PRICE FIELD #1 



© 



PRODUCT CODE -B 
(0,1) 



© 



PRODUCT CODE -C 
(0,1) 

END PRICE 




GO TO COMPUTE MARKET 
PRICE 



Figure 3: Example case statements making use of logically illegal GOTO state- 
ments. When a set of statements is finished the diagram will logically fall 
through all of the other exclusive ORs, ©, and arrive at the END PRICE 
section. Thus no GOTO need be shown. 



So much for the philosophical errors. 
There are also a few common technical 
errors that people make, and we'll look at 
those next. 

Technical Errors 

For a lot of people who are just starting 
to program and may be unfamiliar with 
structured programming techniques, some of 
the diagramming methods may seem to be a 
bit uncomfortable. One of the most often 
seen technical errors is the attempted use of 
a GOTO statement on the diagram. The case 
statement shown in figure 3 illustrates this 
problem. 

Two of the occurrences of the GOTOs in 
figure 3 are incorrect and the other is am- 
biguous. The GOTOs in "PRODUCT CODE = 
A" and in "PRODUCT CODE = B" are 
unnecessary and incorrect. The default 
logical linkages will see to it that the appro- 
priate steps are executed. The GOTO at 
"PRODUCT CODE = C" is unclear. If it is 
supposed to mean that we are to cease 
execution of this process and jump to the 
procedure "COMPUTE MARKET PRICE" 
to begin processing, then its usage is incor- 
rect. If on the other hand it means that 
"COMPUTE MARKET PRICE 1 ' is a com- 
mon utility routine and is described else- 
where in the system, then the GOTO is 
misleading. Instead, we should have written: 



PRODUCT CODE * 
(0,1) 



{COMPUTE MARKET PRICE 
... SEE PAGE #3 f 



if the process was expanded on a different 
page of the diagram; or something like the 
words "...SEE ABOVE" or "...SEE 
BELOW" if that process appears elsewhere 
on the same page. The GOTO is a physical 
entity to be used at execution and is not a 
logical relationship, so it does not belong on 
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Figure 4: Example of a case statement with 
processes that are mutually exclusive and 
mutually independent 



MONDAY f 
(0,1) \ 



TUESDAY f 
(0.1. \ 

© 

FRIDAY f 
(0,1) \ 



OTHER f 

11 \ 



SELECT DAY OF 

THEWEEK 



^ 



(0/ 



Figure 5: When a case statement has mutually independent and mutually 

oi^/uci'im f •f-^t-t-s* iskt* sytst-t-r 4-hi /-> r-f/if/iwrtn-fi" w\ sit ) A^-> i'n/~irvrt vi sisisJ i*t£s\ slirt\t f\ P H n r Xt/l+h- 

cw£ changing the logic of the diagram. 



SELECT DAY OF 
THEWEEK < 



MONDAY, TUESDAY OR FRIDAY 
(0,1) 



TUESDAY 
(0. 



DAY J 
1) \ 



O 



( 



MONDAY f 

(0,1) \ 



FRIDAY f 



Figure 6: Although this is a working WarnierOrr diagram, the case statements 
are not mutually independent. 







ROLL IS A "4" < 



^PLAYER HAS NO BODY /" Qk Mp 
(0,1) -^SKIP 



PLAYER HAS NO NECK /"ci<-ip 
(0.1) ^SKIP 



© 





PLAYER HAS NO HEAD 
(0,1) 



PLAYER ALREADY HAS 

TWO ANTENNAE 

(0,1) 



SKIP 



SKIP 



PLAYER ALREADY HAS f G . VE PL AYER 
TWO ANTENNAE ^Tn ANTENNA 



IF 'player has no body' 

THEN . . . 
ELSE IF 'player has no neck' 

THEN... 
ELSE IF 'player has no head' 

THEN... 
ELSE IF 'player has two antennae' 

THEN ... 
ELSE 'give player one antenna' 

Listing J: Typical if then-else structure for 
Warnier-Orr diagram of figure 6. 



a logical Warnier-Orr diagram. 

Another common technical mistake is 
one that is a little harder to catch, and is one 
that even professionals with this technique 
will make if they aren't careful. Consider the 
case statement shown in figure 4. 

Note that in this case statement, not only 
are the processes outlined mutually exclusive 
(only one of the cases is true), but they are 



..,ii., 

aiou \ i iu t-uau y 



.„^i„ 

II lucpcnuti 



order within the case statement does not 
matter. It would be just as correct for 
me to have written the diagram as shown in 
figure 5. 

In an earlier article "Structured Program 
Design" (Oct 77 BYTE) 4 , the game of BUG 
was outlined. In the game, a die is rolled for 
each player and each number of the die 
corresponds to a part of the bug's body; the 
player finishing his bug first wins the game. 
If a player rolls a 4 for instance, he is entitled 
to one antenna. But he must have already 
acquired a body, a neck and a head in that 
order before he can receive an antenna. He 
needs a total of two antennae if he is to 
complete a bug. 

Many people would try to code that proc- 
ess as a case statement as in figure 6. The 
process in figure 6 certainly looks correct, 
and indeed, if you code it as a case state- 
ment, as in listing 1, it will even run correctly. 

However, this process is not a case state- 
ment. It is more properly called a pseudo- 
case statement, because each of its cases is 
mutually dependent. The cases cannot be 
reordered within the statement without 
destroying its logic. Notice that rearrange- 
ment of the case statement diagram as 
shown in figure 7 does not work at all. This 
arrangement will give the player an antenna 
anytime a four is rolled, until he has two 
antennae, regardless of whether or not he 
already has a body, a neck or a head. A more 
correct logical interpretation of the case 
structure we want is shown in figure 8. 

You might also notice that since the bug 
must have a body before it can have a neck 
(and a neck before it can have a head) if we 
merely check for the presence of the head, 
we will be indirectly checking for the neck 
and the body, so that figure 9 is an equivalent 
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structure. 

Another common technical error is the 
misuse or lack of use of the (0,1) notation in 
conjunction with the exclusive OR,©. Many 
times, people will simply write: 



TEST X 



{ 



CONDITION A 

/ 

CONDITION 8 < 



By this they often imply the (0,1) notation 
with the use of the symbol © alone. Actually, 
this is not incorrect; in fact, for most people 
familiar with the diagrams, this notation 
seems to be just as clear. But for users not 
quite familiar with the Warnier-Orr diagrams 
it is probably best to go ahead and include 
the (0,1). 

To conclude, Til reiterate a point made in 
an earlier article: Understanding a Warmer- 
Orr diagram is very easy; creating one from 
scratch is much harder than it looks.® 



ROLL IS A "4" J 



PLAYER HAS 2 ANTENNAE J GIVE PLAYER 

AN ANTENNA 



(0,1) 





PLAYER HAS NO HEAD 
(0,1) 







PLAYER HAS NO BODY 
(0,1) 







PLAYER HAS NO NECK 
(0,1) 



{ 
< 
< 

•r. 



SKIP 



SKIP 



SKIP 







PLAYER HAS 2 ANTENNAE 
(0,1) 



< 



SKIP 



Figure 7: When the statements in figure 6 are rearranged as shown, it can be 
seen that the program fails to work as desired 



Figure 8: This method of 
approaching the stated 
"bug" problem is more 
logically correct than that 
of figure 6. All of the 
statements at each level of 
the diagram are mutually 
exclusive and mutually 
independent 



BOLL ISA 'A" 



PLAYER HAS A BODY 
(0,1) 







PLAYER HAS A NECK 
(0,1) 



o 



PLAYER HAS A HEAD 
(0,1) 



PLAYER HAS 2 
ANTENNAE J SKIP 
(0,1) 



<■ 



© 



PLAYER HAS 
(0, 



o 

*S A HEAD f 







TntTnNa" 1 GIVE PLAYER 
(0,1) I AN ANTENNAE 



SKIP 



PLAYER HAS A 
(0,1} 



NECK f 



SKIP 



PLAYER HAS A BODY 



(0,1) 



i 



SKIP 



ROLL IS A "4" ^ 



PLAYER HAS A HEAD f PLAYER HAS 2 /" 

(nil < ANTENNAE / SKIP 
' (0,1) V. 



© 







PLAYER HAS 2 f GIVE PLAYER 
ANTENNAE < AN 

(0,1) 1 ANTENNAE 



PLAYER HAS 
(0, 



ASAHEAD^ SKJp 



Figure 9: Since a bug must have a head in order to have an antennae, and a body and neck to 
have a head, the search process can be shortened by just checking for the presence of a head. 
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Top-Down 
Modular 

Programming 



Albert D Hearn 



If .,^..1 U-iv/rt ^nno cnmo nrnorsmniino 

li y\JU IIUVi^ vjwiiv, JUiuv j^i vg,. " O) 

you know that it's one of the most en- 
joyable and satisfying parts of personal 
computer use. The very thought that the 
vast power in the small system's processor 
is limited only by the program that you 
write for it is tremendously exciting. 

If you are new to the computer game, 
the programs you have written up to now 
have probably been relatively small and 
uncomplicated, but you have developed 
a lot of experience and confidence from 
them. Most likely you haven't used any 
particular technique in designing and writ- 
ing your programs: you have probably 
approached program design in an informal 
way and relied upon your good senses to 
guide you in this unfamiliar task. You have 
probably also gained an understanding of 
the full capabilities of the instruction set 
and some of the little tricks (yes, ADDing 
a binary number to itself really does result 
in a left shift of one bit) which can be so 
useful. You are also capable of writing 10 
routines to do about any kind of data 
transfer you want. 

So now you are ready to do a program 
which does something really useful. The 
program you have in mind is going to be 
taraer and more complicated than those 
you have done previously. While you might 
not expect this, your previous informal 
methods of designing and coding might 
possibly be inadequate and could cause 
you much grief if you attempt to use them 
on a larger program. 

Hopefully, I can help you prevent these 
kinds of difficulties by showing you in this 
article an easy to use method of designing 
and structuring larger programs which can 
greatly simplify your personal efforts, 
regardless of complexity. 



The Conce n t 

Someone once said, "To solve a complex 
problem, simply break it down into a num- 
ber of less complex pieces, then proceed to 
solve it one piece at a time." This approach 
has been used for many years in the design 
and building of electronic equipment. It 
results in a "building block," or "modular" 
construction, where each block or module 
does some distinct part of the total function 
of the equipment. For instance, think of the 
last time you saw a diagram of a radio re- 
ceiver. It was probably in the form of a set 
of separate blocks representing the RF 
amplifier, mixer, IF amplifier, and so on. 
The blocks were all connected with flow 
lines showing the sequence in which each 
equipment module processed a signal coming 
from the antenna. The diagram enabled the 
reader to understand the function of the 
radio one module at a time, in relation to 
the whole radio. 

So how does the idea of using building 
blocks and solving problems piecemeal 
relate to the programming of personal 
computers? The answer is that these same 
ideas are very applicable to programming 
and have been in use in commercial pro- 
gramming for a number of years. There is 
no reason that good use of them can't be 
made in the amateur computer hobby also. 

Top-down Design 

Top-down design of microprocessor pro- 
grams requires that you first have a clear 
notion about what it is that you want the 
program to do. You should ask yourself 
questions like, "What function do I want 
performed?", "What input information is 
available?", and "What output information 
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Figure 1: A basic top- 
down design diagram is a 
structure iike this. The 
number of levels may vary, 
and the number of boxes 
may vary, but the basic- 
idea is given by this 
prototype. 



level I (highest) 



level 2 



level 3 {lowest) 



checkbook 
bank stmt 


implied 
inputs 


balance 
checkbook 


implied 
outputs 




comparison of checkbook 
and bank balances 


deposit slips 
checks 






"•—-I 


errors 
corrections 


^ 
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i 
i 


^ 




i 
i 
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i 


i 
t 

_»~ ™ — a. — ™ 
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t 

I 
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Figure 2: The first level of 
design is the act of saying 
7 want a program to do 
thus and so. " Here "thus 
and so " is defined to mean 
checkbook balancing. 



or action do I expect?'* When you can 
answer these questions, you've actually 
completed the highest level of design. 

The basic principle of top-down design 
procedure says that you start at a very high 
level of function definition and then pro- 
gressively expand that function into more 
and more detail until you're at a low enough 
level to begin coding your program. 
Actually, this is a very natural way to 
design solutions to any problem, but, for 
some reason, this method was very slowly 
applied to programming. The top-down 
method is different from bottom-up, where 
the concern is for coding and details before 
a real program design has been done. Bot- 
tom-up methods work on the "how" aspects 
of the program before the "what" aspects. 
An analogy of this method would be the 
building of a house, using no structural 
plans, by first laying down a convenient 
foundation and then gradually adding 
wood and stone until some desirable struc- 
ture has evolved. 

Let's take an example of a function 
that could be performed on a microproc- 
essor system for the purpose of illustrating 
the technique of modular, top-down pro- 
gram design. The function, monthly check- 
book balancing, was selected because it is 
a process that is familiar to most of us and 
it contains ail of the elements which make it 
a good example. 

In order to design what you want the 
program to do, begin by drawing a multi- 
level design diagram like the one shown 



in figure 1. The diagram will describe what 
the program does at a number of different 
levels of detail, starting with the highest 
level which is a single block describing the 
overall function. The next lower level of 
blocks breaks the higher level function into 
a number of more detailed subfunctions. 
The next level takes those blocks and breaks 
them into even greater detail, and so on. 
An important point to remember is that the 
total function of the program is represented 
at each level. 

Figure 2 illustrates the first steps in the 
top-down design of your checkbook balanc- 
ing program. The first block simply states 
that the program will balance your check- 
book. There are no details in that block and 
it certainly doesn't invite coding at this 
point in the design. For input, you know 
that you will have your checkbook entries, 
monthly statement from the bank, deposit 
slips and cancelled checks. The output you 
want is a comparison of your checkbook 
balance (adjusted for recent deposits, ser- 
vice charges and outstanding checks) and the 
balance shown on the bank statement. You 
also want to know where any errors were 
made and what corrections are required. 

The second level of design, shown in 
figure 3, breaks the first level block into 
three major subfunctions. Although this sub- 
division could have been done differently in 
terms of the content of the second level 
blocks, the sum total of those functions 
always adds up to the entire function of the 
program. The idea is that you start the 
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Figure 3: Once the first 
levei of design has been 
determined, the next ievei 
is specified by breaking up 
the task into parts which 
are fundamentally inde- 
pendent of one another. 
Here, checkbook balancing 
is viewed as three separate 
modules of function. 





balance 
checkbook 
































balance 
deposits 




balance 
checks and 
charges 




compare bank 
and checkbook 
balances 



f start J 






match deposits 
in checkbook 
against bank 
statement 






adjust bank 
balance for 
any late 
deposits 






match cancelled 
checks to check- 
book entries and 
bank statement 






adjust check- 
book balance 
for any bank 
charges 






adjust bank 
balance for 
any outstand- 
ing checks 






compare bank 
balance to 
checkbook 
balance 






determine any 
differences and 
correct 
mistakes 






C end J 



far structure of the ap- 
plication is determined in 
a hierarchy such as those 
exemplified in figures I to 
4, then attention can be 
given to sequencing of 
functions. This flowchart 
shows general level se- 
quencing of the checkbook 
balancing application. 
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balance 
checkbook 










































balance 
checks and 
charges 
































t 1 

1 1 
















1 
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1 1 


match cancelled 
checks to check- 
book entries and 
bank statement 




adjust check- 
book balance 
for any bank 
charges 




adjust bank 
balance for 
any outstand- 
ing checks 





Figure 4: Carrying the process one step further, the n&xt ievei is shown here for one of the 
branches of the structure of the programs. 



process slowly and don't attempt to develop 
too much detail too soon. Keep the number 
of subfunctions small, five or fewer, under 
each function block. Don't worry about the 
order in which these subfunctions will be 
performed in your program. Remember, 
you're only concerned at this point about 
what is to be done, not how it is to be done. 

Next, take the design to the next iower 
level by further subdividing each of the 
second level blocks. Figure 4 illustrates a 
portion of this step. Just make sure that 
each subbiock represents a complete sub- 
function and that the subfunctions at any 
ievei are equivalent to the program function. 

You might ask at this point, "How many 
levels must I go through?", or "How do I 
know when to stop?" There is no precise 
answer to these questions, although the fol- 
lowing guidelines should help. In general, 
you will find that you should stop when the 
lowest level of functions is so simple that 
you can easily write a program module to 
do each one. A module should be considered 
to have about 50 program instructions, or 
less. Experience will help you to know when 
you have reached this point. Also, you will 
find that the more complex the program, 
the more design levels you will need; general* 



ly, about three or four levels will be 
sufficient. 

Another method of determining if you've 
carried the design to a low enough level 
comes about almost automatically. If you 
are attempting to complete one of the lower 
levels and you find that the order of sub- 
function execution is becoming difficult to 
ignore, then you've probably gone far 
enough. Also, if you find that it is becoming 
necessary to show that program branching 
or decision making is required (top-down de- 
sign diagrams should show no decision logic), 
then you know that you have about the 
right ievei of design. You are now ready to 
start thinking about the how of your 
program. 

Modular Construction. 

If you try to make each block at the 
lowest level of your design diagram into a 
module, you might determine that some 
blocks are simple and can be combined 
into fewer modules. On the other hand, 
there will probably be blocks which would 
result in modules larger than the minimum 
size of 50 instructions we have established. 
In this case, take the blocks through one or 



more additional levels of design. 

Now decide what sequence the functions 
should be performed in. Begin drawing a 
flowchart showing the required sequence. 
Will each function be performed for each 
pass through the program? If not, add deci- 
sion blocks showing the conditions under 
which each such function is executed. Also 
add any function blocks which may be 
necessary to initialize data, clear tables, 
10 data, etc. 

Figure 5 shows a sequence of functions 
which results from the design of your exam* 
pie checkbook balancing program. Actually, 
the functions shown are probably too high 
level for this step, but for the sake of illustra- 
tion, the diagram should make the point. 

At this time, I would recommend that 
you consider making use of a special pro- 
gram structure called an executive routine, 
which offers some significant advantages. 
The executive is the main routine in the 
program and primarily contains calls to 
the function modules which do ail the 
processing duties. It makes all decisions 
about the sequence of execution. It also 
contains the starting and ending points of 
the program. The objective of the executive 
is to concentrate most of the decision logic 
and common function of the program into 
a separate routine which becomes another 
program module. 

In this way, the function modules need 
not, and should not, make sequencing deci- 
sions. They should never directly pass con- 
trol to another function module. This should 
be done only through the executive. A func- 
tion module's only responsibility is to be 
given control by the executive, do its 
assigned job, and then return control back to 
the executive. Function modules are written 
in the form of subroutines using the call and 
return facilities of the programming language 
being used. They should also contain a 
generous sprinkling of comment statements 
to insure a high degree of understandability, 
as welt as a well-defined 10 interface to the 
outside world and the rest of the program. 

Figure 6 illustrates the final step in the 
modular, top-down design of your check- 
book balancing program. You have added an 
executive routine and some necessary house- 
keeping routines. You could begin coding 
the program from this flowchart by first 
writing the executive and the associated 
subroutine calls for each of the processing 
modules. By writing dummy subroutines 
which simply return control when they are 
called, you can test your executive for cor- 
rect operation without the need for the real 
processing modules. 

The next step, of course, is writing the 
processing routines. This is simplified by the 
design approach described in this article 
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Figure 6: While the sequencing of the diagram shown in figure 5 is adequate, 
it is often usefui to expiicitiy partition ail sequencing of execution in a 
separate module caiied the ''executive*' for the application. This flowchart 
shows a simple example of such an executive program which sequences the 
major operations of the application. 



because it allows you to work on each routine 
as a separate unit which can be written and 
tested independently of all other routines in 
the program. When all routines are com- 
pleted, they simply plug into the executive 
to form a total program. Later, if you want 
to change the sequence of execution, add or 
delete functions, it can be simply a matter of 
manipulating modular routines. » 
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Albert D Hearn 



Microprocessor programming, at this 
point in time, is a black art. Once you have 
learned the basic instruction set, you're on 
your own. Some people get the knack of 
this mysterious task fairly quickly, and 
some do not. Those who do well seem to 
have developed some sort of system for 
going about it. The point is that an or- 
ganized, systematic approach is required 
if there is any hope for continued program- 
ming success. The purpose of this article 
is to describe to you one sucn memOu 
which has become very popular with pro- 
grammers of all types, using all kinds of 
computers from micros to the giants. 

Concept 

What we're looking for is simplicity in 
the writing of programs. This is usually 
achieved if the program can be reduced 
to a collection of basic components which 
fit together in very well-defined ways. This 
is the concept behind structured program- 
ming. 

Any program can be considered to have 
only two basic building blocks. One is the 



process block shown in figure 1. It simply 
performs some defined function, or proc- 
ess. It might represent a simple function 
requiring only a few, maybe only one, in- 
structions in the program, or a much larger 
function requiring many instructions. What- 
ever it does, it has one input and one output. 
The second basic block is the decision 
block shown in figure 2. This elementary 
capability of any computer is that which 
gives it all its power and flexibility. It is 

tilt aUilliy t<L/ C4II-VI *.ltn- ^vx^«( *.w,n.w.» w; *,..«■ 

program based upon the value of some 
parameter or condition which can be tested 
by certain instruction types. For example, 
two numbers can be compared and a test for 
equality used to decide which of two pro- 
gram paths will be taken as a result. 

These two fundamental building blocks 
will now be used in the construction of a 
set of basic program structures with which 
any other program can be built. The three 
general structures are called sequence, if 
then-else and loop. Variations of these will 
be examined, as well as combinations which 
can be used to build more complex functions. 



Figure 1: The process block is the "black box" of programming: it is entered 
by a single input path, does some arbitrary operations upon data, and is 
exited by a single output path. The "arbitrary operations" can be as simple 
as one step in an arithmetic calculation, or as complex as a compilation of a 
program ~~ it all depends on the point of view taken. 
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Figure 2: The decision block is a simpler concept than the process block, in 
the sense that the amount of computation required rarely approaches the 
generality of an "arbitrary process." A decision block has one input and, 
depending upon a binary condition, takes one of two output paths. In this 
figure, the names "true," "then" and "yes" denote one possible path; the 
names "false," "else" and "no" describe the other possible path. In pro- 
gramming languages, the "then" or "else" terminology for the two paths 
is frequently built into the language design; the other terms are frequently 
seen in flowchart representations of programs. 
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Editor's Note: BYTE Flowchart Flow 
Conventions 

As an "ideal" standard, flowcharts 
In BYTE use a direction of flow con- 
vention as follows: 

Defauit flow: Vertical flow is 
from the top of a diagram toward 
the bottom, and horizontal flow 
is from the left of a diagram to- 
wards the righ t t unless explicit 
flow is used. Thus: 



IMPLIED FLOW 
SHOWN BY ARROWS 




Explicit flow: Vertical flow 
upward, or horizontal flow left- 
ward in a drawing, is shown with 
an explicit arrow at the end of the 
flow path < t thus: 



EXPLICIT FLOW 
ARROW AT ENO 
OF PATH WITH 
HORIZONTAL FLOW 
TO LEFT OR 
VERTICAL PATH 
UPWARD 




Merged flow: When two or 
three paths of flow merge the 
two or three inputs to the joint 
path have arrows noted: 
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Figure 3: The sequence structure is the simplest programming structure. 
It can be viewed from the outside as the equivalent of a process block, but 
upon close examination it is found to contain one or more process blocks. 



Basic Structures 

The simplest of the program structures, 
shown in figure 3, is the sequence structure, 
which is composed of one or more process 
blocks strung together serially. Like the 
process block from which it is built, the 
sequence structure has only one input path 
and one output path. In fact, you will soon 
see that one of the rules that we want all 
structures to conform to is that they have a 
single input path and output path. Further- 
more, an entire program, which can be rep- 
resented by one large process block, should 
also conform to this rule. 

The next structure is the if-then-else 
structure, shown in figure 4. It consists of 
a decision block and two process blocks. 
Only one of the process blocks is executed 
for any single pass through the structure. 
The result of the test or comparison repre- 
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Figure 4: The if-then-else structure is composed of a decision block and two 
process blocks. The process blocks may themselves be viewed as any form of 
structure with a single input and a single output path, and thus might in fact 
be sequence structures, if-then-else structures, etc. 
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sented by the decision block determines 
which process block is chosen. Notice that 
regardless of which path is taken there is 
one common exit path from the if-then-else 
structure. This is required to maintain our 
single exit philosophy. 

An if-then-else structure does exactly 
what it says: if a condition is true, then 
take a specified action, else take a specified 
alternate action. However, there are times 
when only one action is required in only 
one of the paths. No action is necessary 
in the other path. In an actual flow dia- 
gram, this is of course shown by drawing 
a flow line in place of one or the other 
process block of the if-then-else structure 
since the most trivial process is simply 
going to the next process without doing 
anything. Note however that only one of the 
process blocks can be made up of this 
simplest case of "do nothing" since if both 
process blocks were eliminated from the 
if-then-eise structure, the net effect would 
be to "do nothing" all the time whether 
or not the condition was true or false. 

The if part of an if-then-else structure is 
simply any program instruction which can 
perform a test and take one of two paths 
depending upon the outcome. In an as- 
sembly language, this is usually a condi- 
tional jump or a branch instruction based 
upon the outcome of some comparison, 
arithmetic operation or other operation 
which affects processor status flags used in 
such branches and jumps. The branching 
instruction specifies the destination address 
of the beginning of one path, whether it is 
the then or the else leg is arbitrarily defined, 
and the next sequential instruction is as- 
sumed to begin the opposite path. 

Some higher level languages like BASIC 
have ready-made if-then-else instructions. 
BASIC has IF and THEN; ELSE is implied. 
The following shows how an if-then-else 
would look in BASIC: 



1 IFX-YTHEN10 



GOTO 15 



10 



FALSE PART 



TRUE PART 



15 END 

In this example, the else code immediately 
follows the IF instruction. The GOTO 15 
ends the else path and causes the program 
to branch to the common exit point at 
line 15. The then path starts at line 10 and 
ends at line 15. [BASIC is considered to be 
an "unstructured'* language because of 



the need for an explicit GOTO following 
the "false parr of an IF THEN-ELSE 
construction, ) 

If you use assembly language in your 
programming, and your assembler has a 
macroinstruction capability, then you can 
write your own if~then-eise macros. It is 
beyond the scope of this article to describe 
how this is done, but it isn't very difficult. 

If you use assembly language and don't 
have facilities for writing macros, then you 
can simulate the function of the macro- 
assembler in order to gain the advantages 
of structured programming. Simply sit 
down and write yourself a set of standard 
if then-else structures. Take the five or 
six most common decision types (equal, 
not equal, zero, greater than, etc) and write 
skeleton programs for each. Leave blanks 
for the actual condition to be tested, and 
leave space for the actual code which will 
perform the then and else functions. Later, 
when you need an if-then-else while writing 
a program, you can draw upon your set of 
prewritten structures. Not only does this 
eliminate your having to invent similar pro- 
gram sequences over and over again, but it 
also prevents many bugs and greatly eases 
the effort you have to put into program 
writing. 

The last basic structure is the loop, 
which provides a means of repeating a se- 
quence of instructions until some stop 
condition is found to exist. There are two 
kinds of loop structures: do~until and 
do-while. 

A do-until structure, shown in figure 5, 
performs the function in the process block 
at least once. After that, a test is done to 
determine if the condition for stopping the 
process looping has been found true. As 
long as the condition is not true, the loop- 
ing continues. When it becomes true the 
looping ends and the exit path is taken. 
This type of structure can be used, for 
example, when you need to search a table of 
values, looking for a particular value. If you 
know that the table will always contain a 
matching entry, the program routine need 
not be more complicated by logic to detect 
end-of table before a matching value is 
found. Notice that the first table entry is 
always examined before the decision is made 
to continue (this is because the ending 
condition decision is based upon the value 
of that entry). 

The second type of loop is the do-while, 
shown in figure 6. The difference between 
this and the do-until structure is that the 
test is done before the process block is 
executed. In many cases there is not a lot 
of significance to this difference because 
both types of structures can do the same 
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jobs. 

In specific situations you will find that 
one form will usually be better suited or 
more convenient than the other. The pri- 
mary difference to remember is that the 
do-untii form always executes the process 
block at least once whether or not the 
until condition is true, and that the do- 
while may not execute the process at aii if 
the while condition is false at the time of 
the first test Experience will best teach 
you which to use in the various situations. 

A variation of the loop structures of 
either form might be considered, the endless 
loop or do-forever. This form of loop occurs 
when the while or until condition is never 
changed to allow execution of the output 
path of the structure. Intentional endless 
loops are occasionally used, as in the low 
level programming trick of hanging up 
execution in a tight loop to flag errors, 
or the quite legitimate endless loops which 
form the outer level of control of a typical 
executive or monitor program. But for most 
programming purposes, an endless loop is a 
bug or error in the program. 

An Example 

Now using the basic structures, we can 
construct a program of any size and com- 
plexity by combining and nesting in any 
manner as long as some fundamental rules 
are adhered to: 

• The program as a process should have 
only one input path and one output 
path. 

® Structures within the program can 
be nested but each structure must be 
totally contained within the structure 
in which it is being nested (this will be 
illustrated later). 

• There should be no branching unless 
it is part of a structure (for example, 
the GOTOs required in languages like 
BASIC). 

• Refrain from attempting to optimize 
the program by violating the above 
rules. There is a right time for this 
later. 

Before we look at an example of struc- 
turing a program, let's first look at how 
nesting of basic structures works. Figure 7 
shows a flowchart of a program which, 
overall, could be represented by a single 
if then-else. But when it is looked at in more 
detail, the else leg contains another if then- 
else as part of the instruction sequence 
there; the else leg of that structure con- 
tains yet another if then-else . The heavy 
outlines show that each of the nested struc- 
tures,, are totally enclosed by their parent 
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Figure 5: The do-until structure is a looping form whose purpose is to exe- 
cute a given process block at least once. After executing the process block, 
the "until condition " is tested and if found to be false, execution loops back 
to repeat the process block before testing the condition again. 



structures; there is no overlap. A BASIC- 
like program to perform the function shown 
in figure 7 appears as listing 1. Again, I 
use outlines to illustrate that each structure 
is embedded in its entirety within another 
higher level structure. Notice that I have 
used indentation of lines to increase the 
readability of the program. Each separate 
structure should be at a different level of 
indentation than its parent. 
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Figure 6: The do-while structure is a looping form whose purpose is to exe- 
cute a given process block only if the "while condition" is true. Thus it can 
execute the process block zero times if the condition is false initially , or an 
arbitrary number of times so long as the condition remains true during re- 
peated execution of the process block. 
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process Q 








IF A>8THEN31 
process R 
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CONTINUE 
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END 











Listing 1: A B ASIC-like 

program equivalent for the 
flowchart of figure 7. The 
lines in the picture em- 
phasize the structured pro- 
gramming formalism. 
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Figure 7: The various types of structures can be nested by noting that any 
place where a process block is indicated, a more complex structure can be 
used since it, too, only has one input path and one output path of execution. 
Thus, for example, this flowchart shows nesting of a process Q block and an 
if-then-else structure as the else part of the if then-else structure with condi- 
tion X=Y?. This second if-then-else in turn has a third if-then-eise as part of 
its else part The outlines show the nesting of one structure within another. 
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Figure 8: An unstructured 
flowchart performs an end- 
less process as might be 
implemented in an auto- 
mobile interlock. This is a 
complete and viable solu- 
tion of the problem, but it 
involves numerous branch- 
ing operations performed 
in an uncontrolled (GOTO) 
fashion. 
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Figure 9: Taking the algorithm of figure 8 and casting it into a standardized, structured pro- 
gramming form eliminates all GOTO operations in languages with a complete if-then-else 
structure, and in languages like BASIC, reduces use of GOTO operations to standardized struc- 
tures. In this flowchart, we've positioned all the blocks to emphasize the nesting of structure. 
One of the primary reasons for the emphasis on structured programming is one of communica- 
tions of ideas to other programmers (or the originating programmer at a later date). The claim 
is made that a flowchart like this one, and its equivalent representation in listing 2, provide a 
standardized way of communicating algorithms which makes the listing or chart easier to under- 
stand and read. 
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Listing 2: A BASIC-like 
application program for 
activating a buzzer of an 
automobile given several 
conditions. A subroutine 
BUZZ is indicated (by a 
call with the keyword 

ut/juoy lu uLtMUtty SuUt/u 

a noise during the loop. 
In this BASIC-like repre- 
sentation, several liberties 
with syntax have been 

taken. 



Let's look now at an example of a simple 
program and show how a structured version 
might differ from an unstructured version. 

The program is one which might be part 
of a future automobile computer control 
system using a microprocessor, its purpose 
is to trigger a buzzer if the ignition key is 
left in the lock when the left front door is 
opened, or if the headlights are left on 
when the key is not in the lock. A delay is 
performed before conditions are checked 
again. 

The flowchart in figure 8 shows how we 
might have drawn it without attempting to 
apply any of the principles of structured 
programming. Now, look at figure 9 which 
shows the structured version. Both forms 
of the program do the same function, but 
the structured form is clearly more straight- 
forward and easier to write code from. 

Basically, a number of things happened 
to the flowchart when it was structured. 
First, all the branches (or GOTOs) became 
forward branches except those in loop 
structures. This allows for reading the chart 
from top to bottom in an orderly way. 
Secondly, each decision block and process 
block has been put into a proper structure 
and nested totally within its parent struc- 
ture. Thirdly, every structure regardless 
of its place in the overall program has only 
one input and one output. 

One thing has happened that might ap- 
pear to be a little strange to you. The se- 
quence structure which performs the buzzer 
function appears twice now, where it only 
appeared once before. This is necessary in 
order to keep the structure clean. Remem- 
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ber, you cannot simply branch into the 
other buzzer block because those two 
structures would then overlap. The inef- 
ficiency implied by the double appearance 
of that block might bother you, but it will 
probably turn out that the block will be 
written as a subroutine and the only in- 
efficiency will be an extra call instruction. 

Listing 2 is a BASIC-like program for the 
structured flowchart. (Here "BASIC-like" 
means using the syntax of BASIC but 
allowing variable names to be many char- 
acters in length for purposes of illustrating 
their meaning.) I have not attempted to 
make the program complete and have taken 
some liberties in order to illustrate my 
points. 

A few words of explanation are in order. 
First, the instructions at lines 3, 4 and 5 
represent a do-until structure which is used 
to implement a delay by simply increment- 
ing a counter (X) until it reaches a large 
value. The name BUZZ represents the line 
number of a subroutine (not shown) which 
activates an electronic buzzer in the car's 
dash. 

Now is the time to go back and look at 
the program to make it more efficient in 
its operation or in the amount of memory 
required. This should be done only if it is 
absolutely necessary. If it is necessary, try 
to maintain the structuring to the extent 
that it doesn't destroy the clarity of the 
program or increase its complexity. In our 
example program, notice that there are 
three CONTINUE instructions at lines 
13, 14 and 15 leading to a GOTO at line 
16. The speed of the routine can be im- 
proved and the memory requirements can 
be reduced by eliminating the CONTINUES 
and changing any instruction which refer- 
ences any of them to go to line 16. Alter- 
natively, you could change each of those 
references to go directly to line 1 although 
you would be seriously interfering with the 
intent of structuring. 

In conclusion, I invite you to try the 
techniques described in this article when you 
write your next program. If you have done 
it any other way before, it takes a little 
getting used to, but I think you will ulti- 
mately agree that it has a lot to offer. 
Hopefully, you will see the benefits in the 
form of less time spent getting your pro- 
gram designed, written and debugged. In 
short, I believe that it can help make pro- 
gramming even more enjoyable.® 
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Applied Structured Programming 
. . .and How to Use It: Part 1 



Gregg Williams 



Regardless of whether you're a newcomer 
to computers or a devoted computer en- 
thusiast who occasionally manages to dream 
in hexadecimal, one thing is true: there's 
always room for improvement in your 
programming. Unless you are exceptionally 
orderly, you probably dive right into flow- 
charting or coding a problem, erase and do a 
lot of filling in when you remember some- 
thing you hadn't thought of earlier, wind up 
with a final program where insertions choke 
your original code like weeds in a garden, 
and spend too much time correcting mis- 
takes you think you shouldn't have made. 
And when you finish the program that (you 
think) is finally running, you complain to 
yourself, "There has got to be a better 
way!" 

There is, and it's called structured pro- 
gramming. No, it isn't a language, and it 
isn't just a way to write a program. It's a 
philosophy of design that pays close atten- 
tion to how a program comes to be written 
and tries to suggest ways to do each step 
more efficiently. 

Structured programming evolved less than 
ten years ago, when computer programmers 
finally faced programs too big to under- 
stand, when the cost of writing and de- 
bugging software began to exceed the ex- 
pense of using extra hardware and computer 
time. The school of structured programming 
evolved after E W Dijkstra voiced several 
thought provoking opinions, one of which 
was that many of our programming prob- 
lems are caused by (over)use of the un- 
restricted GOTO statement, which is present 
in every high level language from COBOL to 
FORTRAN. And many people nodded their 
heads enthusiastically, for who hasn't traced 
the bug in a program to an unexpected 
juxtaposition of GOTOs? 

Structured programming seems to help 
the habitual problems of even the most 
conscientious programmer, problems like 
how to write and debug a large program, 
how to fix (or better, to keep from hap- 



pening in the first place) bugs in programs 
that crash after working correctly for 
months or years, or how to add to an 
already working program without causing 
unexpected side effects. But how do you do 
structured programming in a language that 
permits unlimited GOTOs? 

Specifically, how do you do structured 
programming in BASIC, which is the uni- 
versal language for the microcomputer 
user? Simple - you use GOTOs to imple- 
ment the three basic structured program- 
ming structures that can theoretically repre- 
sent any problem — sequence, do. . .while 
and If. . .then. . .else - and use GOTOs 
for nothing else. 

What's the catch? You have to make 
yourself do it. The trade-off is simple: 
some discipline, a bit more planning in the 
early stages of designing a program, maybe 
a few extra lines of code in exchange for less 
total time spent in programming and getting 
a new program to work, less time spent de- 
bugging, and less chance of unexpected 
"blowups" happening later. It does seem to 
take more time, but that's because your 
lazy brain is protesting the exercise of little 
gray cells in thinking out a program and 
applying discipline; but the total time can 
be less, and the total frustration less. 

I've been doing structured programming 
for some time as I write this (I got into it 
largely due to the complexity of the pro- 
gramming job I have at work), and now I 
wonder why anyone gets let out of program- 
ming classes without learning structured 
programming as the Gospel according to 
Dijkstra. Still, I've found several places 
where by-the-book structured programming 
is a bit awkward ; and since I do have GOTOs 
to work with in BASIC, I found it hard to 
justify not using them when they can be 
used to simplify a program while still re- 
taining the properties of "straight" struc- 
tured programs. 

This article, then, will deal with using 
structured programming in BASIC (with 
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Inversion 
Line 


- 


* 


> 


< 


^ 


< 


and 


or 



Conditional Expression 


Its Inverse 


X3>5 

X - 5 or Y < 

R> (S + 3) 

K2 > K3 and K2 > K4 


X3^5 

X£5and Y>0 

R< (S + 3) 

K2 ^ K3 or K2 ^ K4 



(a) 



(b) 



Figure 1 : Generating the inverse of a conditional expression. When converting structured 
pseudocode to BASIC code, the logical inverse of a conditional expression must often be in- 
serted in an IF statement. At (a) is a table where each line contains two symbols, each of which 
is the inverse of the other. The rule for creating the inverse of a given expression is to replace 
every occurrence of the eight symbols in (a) with its inverse; (b) shows some examples of this. 



emphasis on the word using) and with some 
extra programming structures I have found 



useful 



Some Preliminaries 



Before I begin, I need to get two points 
out of the way. The first has to do with a 
property of the three basic control struc- 
tures that must be duplicated by any pro- 
posed control structure for the latter to be 
suitable for a structured program. (I'm 
pointing this out to justify my additions 
and modifications to the three basic control 
structures.) In a word, each of the three 
structures in strict structured programming 
has the property called "one-in, one-out." 
This means that every time control of the 
program passes through this block (I will be 
calling the code between the boundaries of 
a control structure a "block"), it always 
starts with the same (first) statement and 
always exits through the same (last) state- 
ment -~ in other words, only one way in, 
only one way out. This allows a program to 
be constructed like a series of beads on a 
string, each of which can be examined and 
changed without inadvertently changing any 
of the others. (This is another property of 
structured programming control structures, 
the functional independence of each 
module. Given the same input, a module 
should perform the same operations regard- 
less of what has happened in previous 
blocks.) 

The second point is simply one of defini- 
tion. In structured programming, we have 
situations where a block of code is done 
when a certain relationship holds true; 
if it is false, we do not do that block of 
code. These relationships, called conditional 
or relational expressions, are true or false 
depending on the current values of variables 
contained in the expression; examples are 
XI = Yl, B>3, D+K1 =£ K2. In structured 



programming, we do a block of code when a 
certain condition is met; to express this in 
BASIC, we must use the IF statement to 
branch around the same code when the 
condition is not met, that is, when the logi- 
cal opposite or inverse of the same ex- 
pression is true. 

Several examples will help you here. 
When is X>5 false (for what values of X)? 
When is Kl * 3 false? When is O100 
false? The answers are respectively when 
X<5, Kl = 3, and C<100. Why? Because the 
opposite of "greater than" (as in X>5) is 
"less than or equal to"; the opposite of "not 
equal to" (as in Kl + 3) is "equal to"; and 
the opposite of "greater than or equal to" 
(as in O100) is "less than." And the con- 
verse is true as well, that is, the opposite of 
"less than or equal to" is "greater than," 
and so on. This also works with interchang- 
ing the logical connectives AND and OR (for 
example, the opposite of il G>5 AND 
A1 - 0" is "G<5 OR A1 * 0"). The justi- 
fication for this line of reasoning can be 
found in any book on elementary logic (see 
DeMorgan's Law or DeMorgan's Theorem 
as it is variously known). 

Because of all this, a simple table (see 
figure 1) allows us to find the inverse of a 
conditional expression. The rule to apply 
is: for a given expression, replace every 
occurrence of the symbols <, >, <, >, 
=, =£, AND and OR, with the other symbol 
in the same row; the new expression is now 
the inverse of the first conditional 
expression. 

Putting It in BASIC 

Once we have the three basic control 
structures of sequence, if. . .then. . .else 
and do. . .while, we can look back to the 
moment before their invention and say, 
yes, because we are time bound creatures 
tied to the idea of serial or time ordered 
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cause and effect, how else could we do any- 
thing? One either does tasks in sequence, 
or does a task until it no longer needs doing, 
or does one thing if something is true and 
another thing if it is not. How else can you 
decide on how to do a thing? (Unfor- 
tunately, when people stopped doing things 
by hand and programmed a computer to 
do them "for them/' this intuitive causality 
was sometimes left behind. It's fitting that 
we returned to this intuitive causality only 
when the problem of writing computer 
programs was itself attacked as a problem.) 

The three basic control structures written 
in convenient pseudocode are listed with 
their flowchart equivalents in figure 2. Note 
that, in an if. . .then. . .else sequence, in no 
way can both blocks 1 and 2 be done during 
the same pass, and that the decision whether 
to do a block 1 through n times in the 
do. . .while is made at the beginning of the 
block so that it is possible for a do. . .while 
block to do the enclosed blocks zero times. 

Given these three control structures, 
every problem must be broken into varying 
levels of subproblems, each of which can 
ultimately be expressed as a combination of 
straight sequence, if. . .then. . .else sequences 
and do. . .while loops. How would you 
initially break these problems down using 
the above control structures? 

1. Given a number N, print the num- 
ber, its reciprocal, and -1 times 
the number; 

2. Average five test scores A, B, C, D, 
and E, to an average of V, including 
a 5 point curve if the initial average 
is below 70 points; 

3. Print out the reciprocals of the num- 
bers 1, 2, 3, . . . while the reciprocals 
are greater than 0.005. 



Figure 2: The three basic 
control structures includ- 
ing pseudocode and flow- 
chart equivalents: (a) se- 
quence, (b) if. . .then. . . 
else, (c) do. . .while. A 
''block" of code is one or 
more statements and/or 
do. . .while and if. . .then 
. . .else structures. 



block 1 
block 2 



block n 
(a) 



BLOCK i 






BLOCK 2 
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BLOCK N 



<^CONDtTIOrN 


FALSE 






(TRUE 








BLOCK 1 


BLOCK 2 


\ 
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I 

I 

if (condition) 
then 

block 1 
else 

block 2 
endif 

(b) 




do while (condition) 
block 1 
endwhile 

(c) 



Figure 3: Three problems with their solu- 
tions in structured pseudocode (see text). 



print N 
print 1/M 
print -N 

Given a number N, print the 
number, its reciprocal, and ~1 
times the number; 

(a) 



V=(A+B+C+D+E)/5 
if V < 70 

then V=V+5 

endif 

Average five test scores A, B, 
C, D, and E, to an average of 
V, including a 5 point curve if 
the initial average is below 70 
points; 

(b) 



do while {MH)> 0.005 

print 1/N 

N = N + 1 

endwhile 

Print out the reciprocals of the 
numbers 1, 2, 3, ... while the 
reciprocals are greater than 
0.005. 



(C) 
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if {condition) 

then 

block 1 
block 2 



110 IF (inverse of condition) GOTO 270 
120 



blocks 1 and 2 



block 3 


250 


endif 


260 


(a) 


270 



GOTO 520 



block 3 



510 

520 {next statement after IF) 



1*"/ 



Figure 5: The if. . .then. . .else structure: (a) pseudocode, (b) BASIC equiv- 
alent The first statement (here line 110) is always an IF statement branch- 
ing on the inverse of the condition given in the pseudocode; the branch is 
made to line 270, two lines past the last line performed by the "then " branchy 
(line 250). The next statements are the code represented by the "then" 
branch (here lines 120 to 250), followed by a GOTO to the first statement 
after the "else" code (here line 260, branching to line 520). After the GOTO 
fs the code for the "else" branch of the if. . .then. . .ehe statement (here lines 
270 to 510). In actual practice of course appropriate line numbers would be 
used In the BASIC program. 





do while (condition) 









block 1 




Figure 6: The do. . .while 




block 2 




structure: (a) pseudocode, 








(b) BASIC equivalent The 




endwhile 




first statement (here line 




(a) 




110) is an IF statement 






that branches on the in- 








verse of the given con- 




110 IF (inverse of condition) GOTO 390 


dition to line 390, two 








lines after the end of the 




120 




do. . .while loop. The 
statements comprising the 
body of the loop are next 




blocks 1 and 2 




(here lines 120 to 370), 




370 




followed by a GOTO to 




380 GOTO 110 




the first line of the loop 
(here line 380). 




390 (next statement after do. 


.while loop) 






(b) 







statement 1 
statement 2 
statement 3 



statement 10 
(a) 



110 (statement 1 > 
120 (statement 2 > 
130 (statement 3 > 



200 (statement 1 Q) 

(b) 



Figure 4: The "sequence" structure. The 
translation from pseudocode (a) to ^ASIC 
code (b) is simply one of coding several lines 
in ascending sequence. 



(The answers are in figure 3.) 

Once the idea of solving problems ! n con- 
trol structure forms becomes natural, coding 
the problem in BASIC is no more than a 
straightforward translation (see figures 4, 5 
and 6). Notice as mentioned before, that it 
is the inverse of the condition in the do. . . 
while and the if . .then. . .else that appears 
in the BASIC code; this is because BASIC 
uses conditions for jumping instead of for 
not jumping. Except for that, coding struc- 
tured BASIC is no more than a matter of 
practice. 

After enough time for structured pro- 
gramming to become second nature to me, 
I found that certain applications of strict 
structured programming resulted in pro- 
grams that were overly bulky or inelegant. 
Take the example of a program that sums up 
user entered numbers until a zero is en- 
countered. The structured pseudocode and 
BASIC equivalent are shown in figures 7a 
and 7b. But notice that flag F1 exists only 
to signal that the do. . .while loop should 
be terminated immediately, a situation 
fully determined by whether or not the last 
input N is zero. The test of N in statement 
150 is the second thing done in the loop 
that goes from 130 to 190; if control could 
transfer at the end of the loop to 140, 
which drops into the test at 150, we could 
throw away F1 and the do. . .while loop 
at 130, as in 7b, for a savings of one variable 
and several lines! This happens a lot in 
programs that interact with the user, so I 
thought, what if I devise a structure called 
"read X and do while (condition of X)?" 
It is still one-in, one-out; it's easy to under- 
stand; and it's barely different from a plain 
do. . .while. So I used it and began looking 
for other opportunities to add constructs 
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Figure 7: Improving 
"strict" structured pro- 
gramming: (a) is the 
pseudocode for a problem 
to add user inputs until a 
zero is encountered, using 
oniy sequence, if. . .then 
. . .else and do. . . whiie. 
(b) is the pseudocode 
of (a) translated into 
BASIC* (c) is an equiv- 
alent BASIC solution that 
saves three tines of code 
and one variable by slight- 
ly bending the form of a 
structured programming 
do. . .while loop. 



sum ~0 


110 


S = 


110 S = 


flag - 1 


120 


F1=1 




do while flag~1 


130 


IFFIfl GOTO 200 




input num 


140 


INPUT N 


140 INPUT N 


if num ^0 


150 


IF N=0 GOTO 180 


150 IF N=0 GOTO 200 


then 








sum=sum+num 


160 


S-S + N 


160 S-S + N 


else 


170 


GOTO 130 


170 GOTO 140 


f!ag=0 


180 


F1 -0 




endif 








endwhife 


190 


GOTO 130 




(next statement) 


200 


{next statement) 


200 (next statement) 


(a) 




(b) 


(c) 



to structured programming as originally 
conceived. 

Do. . .until 

The structure closest to the three basic 
control structures is the do. . .until loop, 
illustrated in figure 8. The main difference 
between it and the do. . .while loop is that, 
in the do. . .until loop, the expression to be 
evaluated is the last statement in the loop 
instead of the first; this insures that the 
body of the loop is done at least once. 



A do. . .until loop is written in BASIC by 
writing the statements in the body of the 
loop, then adding an IF statement that 
branches to the first statement of the loop 
if the condition is not met (the inverse of 
the original conditional expression is used 
here). 

Consider our earlier problem of adding 
a number of user inputs until a zero is en- 
countered. The solution to this, using the 
do. . .until structure, is given in figure 9. 
Notice in the pseudocode that I've put the 
conditional relation on the last line of the 



Figure 8: The do. . .until 
loop: (a) pseudocode, (b) 
flowchart, (c) BASIC 
equivalent The first 
statements (here lines 110 
to 380) are the code for 
blocks 1 thru n. The next 
and last statement (here 
tine 390) is an IF state- 
ment that branches on the 
inverse of the condition 
listed in the pseudocode to 
the first statement of the 
do. . .until loop (here line 
110). 







1 

L 






' 


FALSE 




BLOCK t 


until test 
block 1 






BLOCK Z 


block 2 


1 

i 
1 
i 

1 


block n 


BLOCK N 


endif (condition) 


*"XOND 


tionS 



(a) 



110 
120 



380 



blocks 1«n 



390 IF (inverse of condition) GOTO 110 
400 (next statement) 



TRUE 



(c) 
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Figure 9: Illustration of 
do. . .until loop: (a) 
pseudocode^ (b) BA SIC 
equivalent The problem 
Illustrated Is to create code 
that will sum all user 
Inputs until a zero Is 
encoun tered, then print 
the sum. 



S-0 


100 s=o 


do until test of N 




input N 


110 INPUT N 


S = S + N 


120 S = S + M 


end loop if N = 


130 IF N/0 GOTO 110 


print S 


140 PRINT 'SUM IS';S 


(a) 


(b) 



loop to remind the user that the test is made 
at the end, not the beginning, of the loop. 
You may prefer to write the conditional 
expression on the same line as the words 
do until (in this case, line 2 of the pseudo- 
code as do until N = Q), but you will have to 
remember that the test is delayed until 
after the last line of the loop. 

Looking at the BASIC code, there is a 
one-to-one correspondence between the 
pseudocode and BASIC code except that 
line 2, the do until line, has no equivalent, 
and line 5, the endloop line, translates into 



an IF statement that completes the loop 
only if N # (that is, only if the inverse 
of the conditional statement is true). (Look- 
ing back at figure 7c, we see that we have 
improved on our read N and do until N ~0 
loop by one statement, mainly because a 
do until will always have one less BASIC 
statement than its corresponding do while) 

Case 

The case statement is used when the value 
of a variable determines which of N mutu- 
ally exclusive blocks of code is to be exe- 



case on N 


110 GOTO 120,200, 


250 ON N 110 IF Hft GOTO 190 


if N=1, block 1 


120 \ 


120 




if N=2, block 2 
if N=3,block3 


. \ block 1 


. 


block 1 


endcase 


180 ) 

190 GOTO 320 


180 ) 

190 IF N£2 GOTO 240 




200 \ 


200 






[ block 2 


• j 


block 2 




230 } 


230 






240 GOTO 320 


240 IF Nft GOTO 320 




250 ] 


250 






I block 3 
310 J 


• 


block 3 




310 






320 {next statement) 


320 ( 


next statement) 


(a) 


(b) 




(c) 


Figure 10: The case statement: (a) an example in pseudocode, (b) the examp 
a computed GOTO, (c) the example using a series of 1 F statements. The comp 
used when the values that N takes can be "boiled down" to the integers 7, 
ample, if N took the values 10 } IS, 20, we would GOTO on (N-5J/5). A serie 
(c) would be used when the values ofN are irregular and (b) cannot be used. 


le in BASIC using 
uted GOTO (b) Is 
2, 3, . . . (for ex- 
s of IF statements 
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100 FOR 1=1 TO 200 



100 FOR 1=1 TO 200 



170 IF A>BGOTO220 v 170 IF A > B GOTO 700 , 



220 



350 NEXT I 

360 (next statement) 




wrong 



(a) 



350 NEXT I 

360 {next statement) 

(b) 



100 FOR 1=1 TO 9999 
110 IFA>BGOTO360 



350 NEXT I 

360 (next statement) 

(c) 




Figure 11: Uses of FOR. . .NEXT loops in structured programming. A FOR. . .NEXT hop is 
okay if no statement within the hop ever transfers control outside the main loop; see (a). The 
situation in (b) is definitely not structured; there is no way to guarantee that line 360 (and sub- 
sequent lines) will be done when the FOR. . .NEXT hop is completed. A do. . .while loop may 
be fashioned as in (c) , which is equivalent to do while A <B. Note that the index of the hop, I, 
is simply "marking time" but as such cannot be used for another purpose within the hop. 



BLOCK 




beginloop 

block 1 
exitif (condition 1 ) 

block 2 
exitif (condition 2) 

block 3 
endfoop 

(b) 



Figure 12: The beginloop 
. . .exitif. . .endloopsfrao 
ture: (a) flowchart, (b) 
pseudocode, (c) BASIC 
equivalent. Notice the IF 
statements (here at 160, 
320 and 450) all branching 
ta the first line of the next 
structure (here line 470, 
two lines after the end of 
block 3). The second line 
after the end of block 3 
(here fine 460) is a GOTO 
that jumps to the first 
statement of block 7. 



110 



block 1 



150 / 

160 IF (condition 1) GOTO 470 

170 



block 2 



310 



320 IF (condition 2) GOTO 470 
330 



block 3 



440 



450 IF (condition 3) GOTO 470 
460 GOTO 110 
470 (next statement) 
(c) 
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BLOCK 




BLOCK 2 




TRUE 



TRUE 



beginblock 

block 1 

loopif (condition 1 ) 

block 2 
loopif (condition 2) 

block 3 

loopif (condition 3) 
endblock 



(b) 



Figure 13: The begin- 
block . . . loopif . . . exit- 
block structure: (a) flow- 
chart, (b) pseudocode, (c) 
BASIC equivalent Notice 
that the IF statements 
(here at 160, 320 and 
460) go to the beginning 
of block 1 and that they 
branch on the condition 
itself, not the inverse. 



110 



150 



block 1 



160 IF (condition 1) GOTO 110 
170 



block 2 



310 



320 IF (condition 2} GOTO 110 
330 



block 3 



450 



460 IF (condition 3) GOTO 110 
470 (next statement) 
(c) 



cuted (with control passing to the next 
statement after the chosen block is per- 
formed); from this, you can see that the 
if. . .then. . .else structure is a special case 
statement with N = 2. 

A case statement is implemented in 
BASIC by either sequential IF statements or, 
if the variable can be "boiled down" to an 
integer ranging from 1 to N. a computed 
GOTO statement Remember that, since 
controi eventually passes to the first state- 
ment after the case statement, no block 
within the case statement may contain a 
GOTO statement except as the last state- 
ment within a block branching to the first 
statement after the case statement; to do 
otherwise would damage the structure's 
property of one-in, one-out. 

An example of a case pseudocode state- 
ment and two BASIC equivalents is given in 
figure 10. Note that, when using a computed 
GOTO, each block of code must end with 
GOTO nnn, where nnn is the next line 
after the case statement. In figure 10c, IF 
statements are used to branch around the 
blocks of code if the variable N does not 
have the appropriate value for that block. 



Subroutines, User Defined Functions, and 
FOR. ..NEXT Loops 

One of the most important features of 
a structured program is that it is composed 
of one-in, one-out blocks that are not 
jumped into or exited from except at the 
beginning or the end of the block. There- 
fore, as far as I am concerned, there is no 
reason from the structured programming 
point of view why I can't use both sub- 
routines and user defined functions (using 
the DEF statement) in my structured 
programs; they are both one-in, one-out 
constructs of BASIC and save repeating 
identica! code. 

Using the FOR. . .NEXT loop is a dif- 
ferent matter. Unlike the subroutine or the 
user defined function, control can be trans- 
ferred from anywhere inside the loop to 
anywhere outside the loop; in this case, a 
FOR. ..NEXT loop by itself is unsuitable 
for a structured program and should be re- 
placed by either a do. . .until or a do. . . 
while loop (if used properly, a FOR. . . 
NEXT loop can be used to implement 
either of these; see figure 11). But a 
FOR. . .NEXT loop's most valid use is 
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simply as a shorthand for a block of code to 
be repeated identically a given number of 
times. Used this way, the loop keeps the 
one-in, one-out feature necessary to ail 
structured programming control structures. 

Beginloop. . .exitif. . .endloop 

The beginloop. . .exitif. . .endloop struc- 
ture is described in several books detailing 
advanced structured programming tech- 
niques, and while it does not have the gut 
level intuitive appeal the basic three do, it 
keeps popping up in programs I write, so 
it must be fairly useful. 

The flowchart for the beginloop. . . 
exitif . .endloop structure is in figure 12a. 
it is basically a loop with several exit points 
(do. . .while and do. . .until can be seen as 
specific cases of this general form). It has 
one entrance and one exit (several exit 
points, but the transfer of control is always 
to the next statement after the loop struc- 
ture), and an example in pseudocode and 
BASIC is shown in figures 12b and 12c. 
Note that here the conditional expression 
and not its opposite is translated from 
pseudocode into BASIC (see lines 160, 320 
and 450, figure 12c). 

Other Structures 

Even with all the above structures, I 
keep finding situations that can't be fitted 
into any of them. So when several situa- 
tions came up repeatedly, I modified exist- 
ing structures to fit them and still be of 
the most general use. But, for the structures 
that remain, the emphasis is more on con- 
venience than on utility. 

A very useful variation of the begin- 
loop. . .exitif. . .endloop structure is one 
that loops (instead of exits) when certain 
conditions occur. I call this a beginbiock. . . 
loopif. . .endblock structure (see figure 13); 
it is very useful for performing a certain 
operation until all of a series of conditions 
are met. An example of this is the code in 
figure 14 that requests from the user an 
integer input between 1 and 10; notice that 
we loopif the input N is not between 1 and 
10, and we also loopif N is not an integer. 

I have created another pseudocode in- 
struction called read. . .until valid for the 
specific purpose of reading and validating 
a user input, usually when the validation 
process is very simple. The BASIC code for 
the above problem is the same as in figure 
14b (unless you want to add some error 
message statements), and the pseudocode 
is simply: 



beginbiock 




input N 


110 INPUT N 


/oop/7N>10orN<1 


120 IF N>10orN<1 GOTO 110 


loopif H* INT(N) 


130 IF N*INT(N) GOTO 110 


endblock 




(next statement) 


140 (next statement) 


(a) 


(b) 



Figure 74: An example of beginbiock. . .loopif. . .endblock: (a) pseudocode, 
(b) BASIC equivalent The problem illustrated is to get an input from the 
user that is both between J and 10 and an integer. Here the second block 
(between the two loopifsj is empty. 



read N until valid 

invalid if N not between 1 and 10 

invalid ifN not integer 

with the two invalid lines not necessarily 
written down. (Notice that for the last two 
structures, the conditional expressions used 
in the pseudocode are not inverted when 
transferred to the BASIC code.) 

The Beginning of the End 

Those are all the structures I've come up 
with. They may or may not be justified in 
your mind by the improvements they allow 
over "strict" structured programming; but 
each of them is (at worst) a shorthand 
that takes the programmer a step further 
from planning a program in instructions and 
a step closer to planning in well-defined 
subtasks. And because these subtasks are 
always a proper subset of any language 
that allows unlimited GOTOs, it is simple 
to write structured programs in BASIC (or 
in any other all-purpose language), using 
modules of code that are functionally in- 
dependent and "one-in, one-out." 

However, there are several other aspects 
of problem solving — problem definition, 
program design, debugging and testing, and 
program revision — that can benefit from the 
application of a methodical technique (and 
this becomes less of a luxury and more of a 
necessity as program size increases). In the 
second part of this article I'll use the prob- 
lem of writing a game to play NIM to 
illustrate the use of structured programming 
in the entire problem solving process." 
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Applied Structured Programming 
• • •cincl How to Use It* Part 2 



Gregg Williams 



In part 1 I covered the basic constructs 
of structured programming, several addi- 
tional structures (see table 1), and how to 
program them in BASIC. Nov/ ! want 
to show my idea, at least, of good program- 
ming habits, as well as the application of 
structured programming techniques to the 
entire range of problem solving. As an 
example, 1 will show how I went about 
writing a program that plays the game of 
NIM. 

NIM as a Computer Game 

I picked NIM because it is simply ana- 
lyzed, making it possible to concentrate on 
the writing of the program and not on the 
development of the computer's playing 
strategy. 



Basic Structured Constructs 

sequence 

if. . .then. . .else 



Added Structured Constructs 

do. . .until 

case 

subroutines 

for loops 

begintoop. . .ext'tif. . .endloop 

beginblock. . Joopif. . .endblock 

read. . .and do while 

read. . .until valid 

Table 7 ; The constructs of applied 
structured programming, as explained in 
part 7. The "basic" constructs are 
universally recognized as being sufficient 
to implement any program; the "added" 
constructs are recommended by the author 
as extensions of the basic constructs that 
make structured programming more ver- 
satile and manageable. 



The rules are as follows: the game starts 
with a pile of, say, 17 sticks. Players alter- 
nate turns, taking one, two or three sticks. 
The person taking the last stick loses. 

It doesn't take much analysis to show 
that a player is in a "safe" position if there 
are 1, 5, 9, 13,. . .pieces after his or her 
move. No matter how an opponent moves, 
the player can take enough sticks (four 
minus opponent's move) to put the game 
back to a "safe" position. The player's 
opponent is hamstrung and will definitely 
lose. 

The computer's strategy is given in table 
2 and is based on what the pile of sticks 
looks like in terms of multiples of 4. The 
computer wants to leave the pile in the 
form (4n+1), a safe position for it. But 
the computer is in a bad situation if the 
pile looks like (4n+1) at the beginning of 
its turn (it also means the human player 
is in a "safe" position and will win if the 
correct moves are made), in this case, the 
moves of 1, 2 and 3 all leave the computer 
in an "unsafe" position; so, for this program, 
I decided to let the computer take 1 so as to 
prolong the game, 

A Problem Solving Approach 

Before I get into the hand waving that 
will enable you to see how I wrote this pro- 
gram, I'd like to give you an overview of 
how I think a program should be attacked 
a la structured programming. 

Step 1: Define the program in terms of 
what it will and will not do. Don't laugh- 
who hasn't been coding a program only to 
remember, "Omigosh, I forgot to put in 
something to . . . ." Keeping last minute 
additions or afterthoughts to a minimum 
reduces the possibility of unexpected 
interaction between statements, often called 
bugs, glitches, blowups and so on. 

Step 2: Flowchart "the big picture." A 
lot of this is intuitive, but it means break the 
program into the first subprograms that 
come to mind, and show where these come 
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in program flow. Unless a program is very 
simple, it is hard to go straight past this 
step into step 3; I have to literally see the 
program flow in flowchart form before I 
can begin thinking in terms of if. . .then 
. . .else and do. . .while and other control 
structures. 

Step 3: Translate this into an overview 
with structured pseudocode. By now you 
have to have a loose idea of what the sub- 
programs (let's call them modules) will do. 
You don't have to have module definition 
pinned down to the finest point, for simply 
having thought in terms of modules means 
youVe already put more thought into this 
stage than most people do. Also, gluing 
the modules together with structured 
pseudocode gets you started toward a 
structured program, it won't be structured 
unless it starts structured. 

Step 4: Program each module in pseudo- 
code. Aha, here's where you find out what 
you've left out. Notice I said "program." 
I mean it: you should write out exactly 
how a module will be executed as if it 
were the program that goes into the com- 
puter. The pseudocode should be so detailed 
that translating it into BASIC (or any other 
language) is almost a mechanical chore. This 
step may cause you to go back to step 3, 
but that's okay, for it is probably faster to 
revise on paper than in the computer. 

A note on modules: a key factor in the 
success of a structured program is the 
functional independence of modules. This 
means that a module should do a certain 
thing regardless of what the modules before 
it do, thus minimizing the possibility of 
unexpected module interaction. For 
example, if module A is designed to perform 
some computation on variables X and Y 
giving result Z, the only way module B, 
which calls module A, should be able 
to influence results is by changing the 
inputs X and Y prior to the call. The 
internal machinations of B should not 
affect A except through the identified 
input and output parameters of block A. 

Step 5: Translate each module into 
BASIC code. Using the forms I outlined 
in part 1, going from pseudocode modules 
to BASIC modules is a mechanical trans- 
lation process; the only thing you really 
need to think about is assigning and keeping 
track of variables and functions. I use a 
chart to do that; see figure 8 for an example. 
Step 6: Test each module. You're on 
your own here, but you must do something 
to check out what a module is supposed to 
be doing in terms of function, input and 
output. This is a "bottom up" approach 
to programming (note, however, that the 
design is "top down"). Although "top 



Number of Sticks 


Number of Sticks 


in Pile at Beginning 


Computer Takes 


of Computer's Turn 




4n 


3 


4n+1 


1 


4n+2 


1 


4n+3 


2 



down" programming has been praised for 
its ability to catch unexpected module 
interaction, I ask: how can it until those 
modules (mistakes and all) have themselves 
been written? (An incisive analysis of the 
design process is given by Knuth in The Art 
of Computer Programming, Fundamental 
Algorithms, volume l, pages 187 to 189 in 
the second edition.) 

Step 7: Test the program. Glue the 
modules together with the BASIC equiva- 
lent of the pseudocode from step 3. Start 
it running and hunt down bugs. Even if it 
works, keep hunting until you are tired 
of running the program. The brevity of 
this step (and the assurance that the 
program will not one day unexpectedly 
blow up) is your reward for the work done 
in the first six steps. 

Step 8: if you add to the program, add 
structured code. I know it's hard to do. It's 
even hard for me to do, and I'm the one 
who's writing this article. But, unless the 
addition is extremely trivial, make sure 
that the code you add fits in, in a structured 
sense. Don't jeopardize functional indepen- 
dence. Do break down a module if 
necessary to rewrite it. 

NIM: Initial Design (Steps 1 thru 3) 

Now we're ready to work on the NIM 
playing program. After thinking about the 
possibilities, I decided on this rough working 
definition: This program will play a series of 
NIM games against a human opponent. It 
will use the residue of four algorithm for its 
strategy and will give the user the option of 
choosing who goes first and how many 
sticks are in the pile; the default will be 17 
sticks and human goes first, an automatic 
win for the computer. The program will also 
check human inputs for validity. 

My initial flowchart is in figure 1. Notice 
that there are four basic modules: initiali- 
zation, player-turn, computer-turn and 
evaluation. If you want to go into more 
complex detail (and you will have to in a 
larger program), you can say that initiali- 
zation basically sets the number of sticks 
in the pile and who goes first. Player-turn 
accepts a move, checks its validity, and 
subtracts the move from the pile. Computer- 
turn analyzes the pile, chooses a move, and 
subtracts the move from the pile. Evaluation 



Resulting 
Position for 
Computer 

safe 
unsafe 
safe 
safe 



Table 2: Computer's strat- 
egy for the NIM game. 
Either player is guaranteed 
a win (assuming that 
player makes no mistakes) 
if the pile of sticks is a 
number of the form 4n+1 
at the end of his/her turn. 
Notice that when the play 
begins with 4n+1 sticks, 
the computer is forced to 
take one, two or three 
sticks and so leaves itself 
in an unsafe position at 
the end of its move. 
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Figure 7: A high level 
flowchart for the NIM pro- 
gram. Most of the blocks 
represent some large 
chunk of the overall prob- 
lem; such blocks are called 
"modules." Actions com- 
mon to more than one 
block should be brought 
outside the blocks and 
shared. For example, the 
block labeled "evaluation" 
was part of both modules 
"player-turn" and "com- 
puter-turn" until it was 
seen that it could be 
brought outside and made 
a module of its own. 



( begin j 




PLAYER-TURN 



COMPUTER-TURN 



EVALUATION 
(OF BOARD) 



SET TO 
OTHER PLAYER 




NC^GAME 
ENDED 
P 



MESSAGES 




NO ^SESSION 



C END J 



F ? gUr f, 2: The structured Pseudocode overview. This overview, equivalent to 
the flowchart of figure 1, is written in a pseudocomputer language and is 
the first step past the flowchart toward a completed BASIC program. Notice 
that the modules which will be filled in as details are later noted as names 
enclosed in parentheses, that preliminary variables are used and given descrip- 
tive names, and that the entire problem is outlined by this pseudoprogram. 
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NIM: 

games-played = 
do until test of endsession 
do until test of gamewon 
Unitialize) 

if computer's-turn - then 
(player-turn) 
else 

(computer-turn) 
endif 
(evaluate) 

computer's-turn = 1 - computer's-turn 
endloop if gamewon 
print endgame messages and ask if user wants to piay again 
receive user response 
if response * yes fie: if endsession - 0] 
games-played = games-played +1 
endif 
endloop if endsession 
end-of-program 



checks to see if the pile is down to 1 (in 
which case the winner is declared) and also 
makes several comments as endgame ap- 
proaches (this is the only module whose 
function grew as the module was written). 

The first real work is done with the crea- 
tion of the structured pseudocode overview 
in figure 2. The process is fairly simple here 
because the program is well-defined as a 
flowchart; but with a flowchart that has 
constructs that are definitely not recog- 
nizable control structures, you have to twist 
the flowchart (maybe even rewrite it) until 
you can see it in terms of sequence, //. . . 
then. . .else and do. . .while. 

Notice that at this step you begin to 
define flags. There will be a flag (which will 
have a value of 1 or 0, standing for true or 
false) representing the status of endsession, 
gamewon and computer's-turn. Notice also 
that variable names are descriptive enough 
for the reader to understand exactly 
what is happening; be sure to keep this first 
pseudocode overview readable. 

NIM: Detailed Design (Step 4) 

The bulk of thinking from here on out is 
in this step, the writing of each module of 
pseudocode. You will probably discover 
changes and additions you need to make; 
the advantage of doing so at this point is 
that it is easier to make corrections and 
revisions in pseudocode than it is to make 
them in BASIC. One reason for this is that, 
since pseudocode is not read by the com- 
puter, you do not have to spend any time 
making sure that it is syntactically correct 
(instead, you spend the same time making 
more changes); another is that pseudocode is 
easier to change because it is easier to 
read. Compare the pseudocode "if who- 
plays-first = computer, then. . ." with the 
BASIC statement "1600 IF P=0 GOTO 
1660.° 

The pseudocode for my basic modules 
is in figures 3 thru 7. Although I made 
really only one draft of each module 
before I translated it to BASIC, the 
draft itself contains many erasures 
and insertions; for me, working in pencil 
is a must. You may find an operation 
that occurs several times within the 
different modules; if so, you II want to 
make it either a module or a subroutine. In 
the case of the NIM program, I decided to 
make a module out of the part of code 
needed to print the current board position. 
I could have left it as part of the evaluate 
module, but to do so would have obscured 
the module's purpose with too much detail. 
As it stands, the only information given \n 
the evaluate module is "Print current board 



Figure 3: The initialize module. 
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Line No 


Line No 








1 


220 


2 


230 


3 


240 


4 


260 


5 


280 


6 


310 


7 


310 


8 


330 


9 


360 


10 


380 


11 


390 


12 


400 


13 




14 


420 


15 


440 


16 


460 


17 


470 


18 


490 


19 


500 


20 


510 



Pseudocode 

initialize: 

if games-played = 

ask if user wants instructions 
read user-answer until valid 
if answer is 'yes' 
print instructions 
endif 
endif 
ask user if he wants to choose number sticks and who goes first 
read user-choice until valid 
if user-choice = default 

number-of -sticks - 17 
computer's-turn ~ 
else 

ask user how many sticks to begin with 
read number -of -sticks and do while number-of -sticks <13 
error message 'sorry, we have to have at least 13 sticks' 
endwhile 
ask user who goes first 
read computer's-turn until valid 
endif 



The pseudocode in figures 3 to 7 represents the second level of breaking 
a problem into subproblems (the first level was from defined problem to 
the structured overview of figure 2). Notice that the trend is to write the 
lines so that they are easily understood rather than to make them look 
like formal computer code. The numbers in front of most of the lines 
represent the beginning line numbers of the equivalent statements in the 
BASIC program. 



Figure 4: The player-turn module. 
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845 
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830 
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840 


5 


850 
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855 
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860 
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860 
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870 


10 


875 


11 


880 


12 


880 


13 


900 


14 


920 



Pseudocode 

player-turn: 

do until test of invalid-move 
valid-move ■ 1 
ask user for his move 
input user-move 
if user-move not between 1 and 3 

print error message 

valid-move = 

endif 
if user-move > number-of-sticks 

print error message 

valid-move - 

endif 
endloop if valid-move = 1 
number-of-sticks = number-of-sticks - user-move 



Figure 5: The computer-turn module. 
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1170 
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1190 
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1210 


8 


1230 



Pseudocode 

computer-turn: 

remainder * number-of-sticks modulo 4 

case on remainder 

if remainder=0, then computer's-move=3 
if remainder^!, then computer's-move=1 
if remainder =2, then computer's-move 3 ! 
if remainder^, then computer's-move=2 

number-of-sticks = number-of-sticks - computer's-move 

print computer's-move to user 
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Relative 
Line No 


BASIC 

Line No 


Pseudocode 








evaluate: 


1 


1415 


(print-board) 


2 


1520 


if number-of -sticks ■ 1 


3 


1540 


if computer's-turn ■ 1 


4 


1550 


print computer-loses message 


5 




else 


6 


1570 


print user-loses message 


7 


1570 


endif 


8 


1570 


endif 


9 


1590 


if number-of -sticks between 6 and 8 


10 


1600 


if computer's-turn = 1 


11 


1620 


if RND<0.6 


12 


1630 


print computer-resigns message 


13 


1640 


number-of -sticks = 1 


14 


1640 


endif 


15 




else 


16 


1660 


if RND<0.3 


17 


1670 


print you 're-in-trouble message 


18 


1690 


ask if user wants to resign 


19 


1700 


input user-answer until valid 


20 


1720 


if user-answer - 'yes' 


21 


1730 


print user-resigns message 


22 


1740 


number -of -sticks ■ 1 


23 




else 


24 


1760 


print nasty answer 


25 


1760 


endif 


26 


1760 


endif 


27 


1760 


endif 


28 


1760 


endif 



Figure 6: The evaluate module. "RND" refers to a random number between 
zero and one; when either player resigns, number-of-sticks is set to 1 to signal 
end-of-game. 



position/' which tells the reader exactly 
what is being done; if the reader wants more 
detailed knowledge, it is possible to refer to 
the print -board module. 

The value of pseudocode can be seen in 
the fact that very little in the program needs 
to be explained, This is why high level lan- 
guages which are closer to pseudocode make 
better programming languages than BASIC. 
Figures 3 and 4 are complicated only by 
read. , Mntii valid statements that check user 
responses (the checking of input data is 
usually a good idea unless your computer 
has real space problems). The computer-turn 
module, figure 5, implements the computer 
strategy of table 2. 

In the evaluate module, figure 6, if both 
players play perfect games and the number 
of sticks is six, seven, or eight, the player 
who has just moved will definitely lose the 
game (the opponent can take one, two or 
three sticks, respectively, finishing with five 
sticks, a "safe" position for the opponent). 
The if statement beginning "If number-sticks 
between 6 and 8" (line 9 of figure 6) and 
ending with the last "endif" of the module 
takes care of this situation. If the computer 
is about to lose, it resigns six-tenths of the 
time; otherwise, it gives the human a chance 
to resign three-tenths of the time. (Notice, 
in figure 1, that the evaluate module comes 



before the variable computer's- turn is 
changed, so that, in the evaluate module, 
computer's-turn=0 means that the human 
has just played and that the computer's turn 
is next) 



NIM: Translating to BASIC (Step 5) 

Two aspects of the pseudocode-to-BASIC 
translation need attention: the translation 
itself (covered in Part 1), and the assign- 
ment of variable names and meanings. The 
translation, although it requires attention, 
is straightforward enough once you are used 
to it. Assigning BASIC line numbers cor- 
responding to relative line numbers of 
pseudocode (given in figures 3 to 7) should 
make matters easier. 

The assignment of variable names, 
however, is another matter. Each named 
variable or flag must be replaced with a 
letter or letter-plus-digit name. I think it is a 
good idea to keep track of what names have 
been used, what modules they are used in, 
and what they are used for. An example of 
the chart I usually make (here, for the NIM 
program) is in figure 8. I also try to decide 
whether or not I need to store a variable's 
value for use later in the program; if not, I 
can use the same variable later and save a 
few bytes of storage (when compared with 
creating and using a new variable). 

When writing a module of BASIC code, 1 
write the entire module, using a circled 
letter in colored pencil to link the space 
after a GOTO to the line number it belongs 
with. Then I number the entire module 
and replace the circled letters with the cor* 
rect line numbers. 

Comments are very important and, unless 
you are working with severe memory restric- 
tions, there is no excuse for your not using 
them. (Even with memory problems, put 
comments in your final draft and keep a 
copy.) For example, the comment 



2150 REM K IS THE SUM OF I AND J 
2160 K- l+J 



is extremely lame. But if, in its place, you 
write 



2150 REM K IS SUM OF FIRST AND 
SECOND GROUP SCORES 



then, within the context of the program, the 
comment will probably remind you (after a 
long absence) of several things you'd for- 
gotten. Given the restrictions on variable 
names in BASIC, comments are more neces- 
sary than they would be in languages with 
longer name possibilities. 
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! might point out several places where 
you should always use comment statements. 
One is at the beginning of the program for 
a summary of the name of the program, 
your name as its author, purpose, and so on. 
See lines 50 to 70 in the NIM program in 
listing 1 for example. 

Another place to put comments is at the 
beginning and end of modules, if possible, 
with some eye-catching typography. BASIC 
programs do seem to run together after a 
while (see lines 200, 520, 800, 930 and 
others in listing 1). 

A third place to put comments is just 
before a major control structure (ie: one 
spanning more than a few lines of code). 
Gertrude Stein might not have said, "An if 
is an if is an if. . .," but she should have. 
Things are easier if you know that an IF 
statement is actually the beginning of a 
do. . .until, an if. . .then. . .else, or some- 
thing else. For example, look at the com- 
ments at lines 810 and 890 of the NIM 
program: 

0810 REM DOUNTIL900;ENDLOOP!F 

VALID {V=1> 
(body of do, . .until) 
0900 IF V = GOTO 0820 

A glance at line 810 tells us we are beginning 
a do. . .untii that ends at 900; it also tells us 
the condition and the reason for looping. 

Heavily commenting a program reaps 
such intangible benefits that it is difficult to 
justify the time, memory and effort that 
commenting requires. You have always 
heard that comment lines greatly help a 
programmer who must examine a program 
weeks or months after it is written. But you 
probably do not realize that the very act of 
writing down the comment, of trying to find 
the most important few words that will help 
to clarify the situation, that this very act not 
only helps you to remember a given fact 
longer, it also causes you to analyze the 
given situation (and thereby understand it 
better), maybe even to find a mistake you 
had not seen. 

Remember, comments may take effort, 
but the whole idea of structured program- 
ming techniques is that effort on the front 
end will save greater efforts later on. 
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Pseudocode 

print-board: 

print THE BOARD IS'; 
sticks - number-of-sticks 
do while sticks >5 

print 7////'; 

sticks = sticks ~5 

endwhile 
for I * 1 to sticks 

print '/'; 

next I 
print ' ' 



Figure 7: The print-board module. This module is actually part of the 
evaluate module but is separated from it for purposes of clarity. Sticks is 
a new variable that is decremented to zero as the current board position is 
printed; notice the semicolons in the print statements that make the state- 
ments print on the same line, as in BASIC. 
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Name 










N1 
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X 
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V 
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M1 




X 






M9 
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S1 








X 



Use 



number of games completed; 
used outside all modules 

user-choice to choose sticks and 
first player, temporary 

user-choice to resign, temporary 

number-of-sticks 

computer's-turn; indicates next to 
play: computer=1 , player^O 

during player's turn, indicates 
if move valid: 1=yes, O^no 

player-move \ 

\ both *1,2 or 3 
computer-move ; 

remainder of number-of-sticks (S) 
modulo 4, temporary 

equivalent to S, destroyed by the 
print-board module 

Figure 8: A table to keep track of variables used. This table shows which valid 
BASIC variable names are being used; in which modules they are used; the 
variable's meaning and whether or not the variable's value needs to be saved. 
Note that C is a temporary variable; since its value in the initialize module 
need not be saved, it is used again in the evaluate module for another pur- 
pose. In a more complex program, you would make a note by the variable 
name if it is an array (numeric or character) as opposed to a simple variable. 



NIM: Testing (Steps 6 and 7) 

A module is tested by writing code 
around it that provides it with the variables 
that affect the module's behavior and state- 
ments that somehow display the module's 
output. Then the module-plus-test-routine 
should be run, varying the inputs across 
their spectrum as much as is practical 
(testing all possible input combinations is 
the only foolproof method but, alas, be- 



comes infeasible very quickly). The outputs 
should be predicted before the test is run 
and then verified; "eyeballing" the outputs 
often lets mistakes slip by that you would 
otherwise catch. 

Program testing is usually more frus- 
trating than module testing, mainly because 
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Listing 1: The completed NIM game, written in BASIC. This program plays 
multiple games of NIM against a human opponent with endgame messages 
to the user that differ from game to game. The two most important charac- 
teristics of the program are, first, the liberai use of REM statements, and 
second, the coding of the program in terms of structured programming con- 
trol structures, which greatly simplifies program design and debugging. (This 
program was run on an IBM 5100.) 
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0520 
0590 
060 
0610 
OB 
0810 
0820 
0B3O 
0840 
0845 
0850 
0855 
0860 
0870 
0875 
0880 
01390 
090 
0910 
0920 
0930 
100 
110 
1110 
1120 

1140 
1150 
1160 
1170 

I 180 

I I 90 
120 
1210 
1220 
1230 
1233 
1235 
1 236 
1238 
1240 
1250 
1400 
1410 
1415 
1420 
14 30 
1440 
1450 
1460 
14 70 
1480 
1490 
150 
1510 
1520 
1525 
1530 
1540 
1550 
1560 
1570 
1580 
1590 



REM ***NIM PROGRAM, WRITTEN BY GREGG WILLIAMS*** 

REM kkWRITTEN 15 APR 77. EAST UPDATE 16 APR 77** 

REM ** TRY IT--IT CAN BE BEATEN ** 

N1 = 

REM ■--■**** MODULE INITIALIZE -END AT 520 ***«:-- 

REM --GIVE USER J NOT RUCTION OPTION IF FIRST GAME (Nl^O).' 

IE Nl/0 Ol O 320 

PRINT HO YOU WANT INSTRUCTIONS-:" (l^YEG, O-NID' 

INPUT Nl 

IF Nl^O&Nl/l GOTO 0240 

IF N1=0 GOTO 0320 

PRINT ' 

OKAY NIM IS PLAYED WITH 1/ OR MORI STICKS. WITH A' 
MOVE C GNS 1 S TING OF \\\Z PLAYER* S T AK I NO . 1 , 2 , OR 3 ' 
PLAYING PIECES, OR ' "STICKS " UF ALTERNATE TURNS,* 
AND THE PLAYER FORCED TO TAKE THE LAST STICK LOSES . " 



1 USUALLY PLAY UUH W STICKS AND YOU GOING FIRST.* 
TYPE 1 IF THAT'S OK UIIH YOU. OTHERUISF 



PRINT 

PRINT 

PRINT 

PRINT 

PRINT 

PRINT 

PRINT 

PR INI 

INPUT C 

IE CIKC/O GOTO 0360 

IF C = GOTO 0420 

G - 1 7 

p--=o 

GOTO 0520 

PRINT -HOW MANY STICKS DO YOU WAN I TO START WITH"' 1 

INPUT S 

IF S 13 GOTO 04fl() 

PRINT hkGORRY- UE HAVE TO HAVE AT LEAST 13 STICKS**' 

GOTO 0440 

PRINT '* 

PRINT "TYPE ZERO <0> TO GO FIRST; E USE TYPE 1' 

INPUT P 

TE P/OiUVl GOTO 0500 

REM END OF MODULE INITIALIZE- 

REM 

IF P-l GOTO 1100 

REM 

REM *** MODULE USE R S- TURN --END 930 ***>■■ 

REM DO UNTIL 900, ENBLOOP IF MOVE IS VALID <V*1)-' 

PRINT ' ' 

PRINT YOUR TURN- -ENTER YOUR MOVE ' 

INPUT Ml 

V = l 

IF M1URM1.-3 GOTO 0070 

PRINT 'YOUR MOVE ISN"T BETWEEN 1 AND 3' 

V=0 

IF MTS GOTO 0B9O 

PRINT THERE AREN^T THAT MANY PIECES LEFT 

V=0 

REM .^NEXT STMT IS TEST FOR END OF DO UNTIL LOOP:- 

IF V=0 GOTO 0820 

REM --TAKE AWAY FROM CURRENT NUMBER OF STICKS:'-- 

S=S~M1 

REM :-END OF MODULE USER S-TURN-- 

GOTO 1400 

REM -«*** MODULE COMPUTER S-TURN— END 1240 «**>-- 

REM : 'R IS (HSTICKS) MODULO 4 

R=S"4*INT(S/4) 

cii-M CfUJE HTMT.—R HAS VALUE .■ ! 2 .- 3 ■ SO RO ON R+'t' 

GOTO 1150^1170,1170,1190 ON < R-U ) 

M9- 3 

GOTO 1210 

M9---1 

GOTO 1210 

M9* 2 

REM DECREMENI STICKS AND 1NEORM USER OE YOUR MOVE 

S*S M9 

PRINT ' 

PRINT "MY MOVE IS ' , M9, "ST ICK ' , 

IF M9M GOTO 1238 

print *a- 

GOTO Y3 

PRINT ' * 

REM E. «■ Oh MODULE COMPUTE- R MOVF - 

REM 

REM -;*** MODULE EVALUATI ■ ■■ END 17?« *** 

REM ' *RINF CURRENT BOARD. 

PRINT THE DOARD IS ' t 

01-11 

If ill : f » GOTO 1470 

PRINT ///// ', 

S1=S1 5 

GOTO 1430 

FOR T-l TO SI 

PRINT V, 

NEXT I 

PRINT ' ' 

REM -IF t)STICKS=l DO THE FOLLOWING, NEXT STMT STARTS 1580' 

IF 3*1 GOTO 1580 

PRINT " 

REM ^'NESTED IF — IF COMPOS TURN, RESIGN; ELSE USER LOSES:- ', 

IF P-l GOTO 1570 

PRINT OUCfH IT LOOKS LIKE I LOST THAT ONE--NICE GAME." 

GOTO 1580 

PRINT 'SORRY, CHUM! THAT - * S THE LAST STRAW—YOU LOSE.' 

REM <-NEXT IF DEALS WITH COMP/USER RESIGNATION IF F-6«7#8>> 

IE S--8IS-6 GOTO 1770 



only the most elusive bugs evade module 
testing. But the method of predicting pro- 
gram behavior and output for a given set of 
inputs remains much the same as for module 
testing. 

Because i had only four modules and 
such a simple design, I skipped module tests 
and went on to test the entire program 
by playing a few games. I found the fol- 
lowing errors: two typing errors, flag V 
was not set (line 845 was added), and a 
flag was set wrong at 1540. At this point, 
the program was functionally working. 

NIM: Additions (Step 8) 

At this point, the NIM program is 
finished and running. However, playing 
several games, I noticed little things that 
bothered me: lines of output bunching 
together when they were not logically 
connected, error messages that needed 
to be included, the computer writing 
"MY MOVE IS 1 STICKS," to name a 
few. So I repaired several things, mostly 
evident from lines in the BASIC program 
not ending in zero. 

One option that does not show is the 
if. . .then statement at lines 220 and 230 
that skips the asking-of-rules (lines 2 thru 6 
in figure 3) for every game but the first. 
Fortunately, this could be added fairly 
easily by adding a new variable, games- 
played (or N1), updating it (at line 1990), 
and by placing lines 230 thru 320 in an 
if. . .then structure that gets done only if 
games-played equals zero. 

Sometimes it takes more effort to add 
code so that the resulting program is still 
structured. But programs resemble organisms 
in that they tend to grow quite a bit after 
the first time they are "finished." So, in 
the interest of maintaining a structured 
program (which is easier to work on), I 
make it a rule to add structured code to my 
programs. In my experience programming 
at work, it's been worth it. 

Final Thoughts 

If nothing else, I hope that I've con- 
vinced you that time spent in planning is 
later paid back, and with interest, because 
that's what structured programming is all 
about. By planning your program before 
you write it, you eliminate time wasted in 
finding out what youVe forgotten; by 
planning your program to fit certain control 
structures (thereby causing program flow to 
take a recognizable form), you save time by 
not having to untangle the spaghetti-like 
structures that you might otherwise come up 
with. 
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Listing 7, continued: 



tAnn n p-o goto iaad 

1610 RFH TF COMP II. »() PLAY. Ml MAT OK MAV NO I Nt'UfjN 

1620 IF RND. .6 ROW 1 ,VH 

1630 PRINT 'ARGON'' mi)' 'VI GUI i\\. I CAN KM. J Rl S I ON , 

16M-0 S-l 

1650 GOTO 1770 

1660 IF RND- .3 GDI!) 1 770 

166b PRJNT ' : 

1A71) PRINT *NO MA11FR UHAl mil IK). T ' * VF- GOT YOU. HO mil WANT 

1680 PRINT 'TO RESIGN GRALL F Ul.l.Tf , OR DO U£ T-3GHT 11 HIM''. 

IA90 PRINT ' (1 Til RfcOJGN, 10 PLAY) 1 

1700 INPUT r 

1710 IF C*08C*1 GOTO t 700 

1720 if r;-n goto 1 mo 

1730 PRINT *OK, I ALCLPI rOUR RIMGNATION. GOOD GAME , ' 

3 740 S-l 

1750 GOTO 1770 

1760 PRINT 'OK, CLOUN , IT'S YOUR FUNERAL 1 

1770 REM »END OF MODULE fcVALUAIF - 

1900 REM CHANGb P TO HttLFCT NEW PLAYER 

1910 P-] -P 

1920 REM - DON'T LOOP IF tND-Ot-HAME (GIVEN DY U -1 ) 

1930 IF S< 1 GOTO 0600 

1940 PRINT ' = 

1950 PRINT 'HO YOU WANT TO PLAY ANOTHER GAME w > U-YEG, • NO > 

I960 INPUT C 

1970 It CtM&OH GOTH I960 

1901} TF C-0 GOTH mill 

1990 Nl-Nl+1 

2000 GOTO 0200 

2010 PRINT *OK, LALL fit ISP UHF N VUIJ ' ' VE DOT MUPF TIME.* 

?0P0 i ND 

31)1)0 EMM I N Ii il I P R U- II R A rt 



Structured programming in its broadest 
sense is several things. On the highest level, 
it is completely knowing the problem. On 
a middle level, it is the recursive process of 
repeatedly breaking a problem into subpro- 
blems until each subproblem, at whatever 
level, presents a self-evident solution. (Also, 
this level requires some awareness of the 
basic control structures.) On the lowest 
level, structured programming is writing 
each subproblem (using one of several 
given control structures) so that program 
flow is standardized to one of several recog- 
nizable and easily traced patterns. 

It's strange that computer programmers 
took so long to analyze their own pro- 
gramming methods, especially since analysis 
is so necessary to the problem solving 
process. But the analysis was finally done, 
giving birth to the idea of structured pro- 
gramming. 

Structured programming is not univer- 
sally acclaimed. But the fight between pure 
structured and pure unstructured pro- 
gramming is largely an academic one. In the 
field, applied structured programming (or 
many of its techniques, under different 
names) is essential to programming complex, 
real world problems. And that means that, 
even in programs of computer experi- 
menters, it couldn't hurt.» 
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DeclsiOil Tables: How to Plan Your Programs 



Thomas G Bohon 



IF 



THEN 



(condition 
statement) 

(action 
statement) 



CONDITION 
STUB 


CONDITION 

ENTRY 


ACTION 
STUB 


ACTION 
ENTRY 



Figure 1: Basic eiements of a decision tabie. A decision tabie is a formai 
iisting of a series of interconnecting facts and possibie aiternative actions 
associated with a particuiar situation or process. 



_Row 1_ 
_Row2_ 
Row 3 



Row n 



TABLE HEADER 






Rule ' Rule 
1 | 2 



-4— 



Rule 
m 



r 



__j 



Figure 2: Additional decision table elements. The table header (or name,/ 
allows each table to be uniquely referenced. 



EXTENDED ENTRY EXAMPLE 


1 


2 


3 


ELSE 


Compare Amount to Discount 
Amount 


Amt 1s 
Disc 


- 


Amt Gr 
Disc 


- 


Compare Quantity to Quantity 
on Hand 


Qty 1s 
on Hand 


— 


QtyGr 
on Hand 


— 


Billing Rate 


I Regular 




Discount 


— 


Quantity to Ship 


Ordered 




Ordered 


— 


Investigate 


- 




- 


ERROR 



Figure 3; Exampie of an extended entry decision tabie. 





MIXED ENTRY EXAMPLE 




1 


2 


3 


ELSE 


Ordered > Discount Amount 


N 


Y 


Y 


_ 


Buyer Type 


Retail 


- 


Wholesale 


- 


Give Discount Billing 


_ 


X 


— 


_ 


Back Order Ordered Less 
on Hand Amount 


X 


_ 


X 




Investigate Error 


- 


- 


- 


X 



Figure 4: Exampie of a mixed entry decision tabie. 



"Oh, no," you say to yourself, "Another 
one of those fancy techniques which no 
one can understand, I can't use, and I can 
definitely get along without!" 

Did something like the above pass 
through your mind when you read the title 
of this article? Well, put aside your doubts 
for a second and read a bit further. I think 
you'll be pleasantly surprised to learn that 
you already know the process I'm going to 
describe and, in fact, probably use it in a 
very informal way every day. All I want to 
do here is to formalize what you already 
know and show you how you can apply 
this knowledge to make the job of program- 
ming your home computer a little easier. 

What am I talking about? Decision 
tables, of course. And, after reading this 
article, you should have a better understand- 
ing of what they are, how they are con- 
structed, and how to use them effectively- 
Some Definitions Before We Begin 

A decision table is simply a formalized 
presentation of the mental process each of 
us goes through every time we are con- 
fronted with a series of facts which require 
us to decide on one course of action or 
another. Stated another way, a decision 
table is merely the writing down of the 
facts and possible alternative actions asso- 
ciated with a particular situation or process. 

In programming, decision tables act as 
effective substitutes for, or as an aid to, 
the block diagrams associated with prelim- 
inary flowcharting. They are used primarily 
when the situation being studied involves 
complex decision logic, since the decision 
table presents not only the original con- 
dition but also the course of action in 
an easy to understand and easy to use 
tabular form. 

There are two main sections of a deci- 
sion table (see figure 1). The upper section 
(shown as exactly half of the tabls, a situa- 
tion not necessarily found in an actual 
situation) presents the possible conditions 
upon which the decision will be based. 
The lower portion (again, not necessarily 
half of the table) presents all possible 
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actions resulting from the possible decisions 
in the upper portion. 

Each portion of the table is further 
broken up into two sections, with the left 
hand section being called the stub and the 
right hand section called the entry. Thus, 
in our typical decision table we have a 
condition stub and a condition entry in the 
upper portion, and an action stub and an 
action entry in the lower portion. 

Figure 2 shows the remaining elements 
of a decision table. Note that there is a 
tabie header (sometimes called the iabel 
or name) which allows each table to be 
uniquely referenced. This is necessary 
in complex situations where the condi- 
tions and actions may require multiple 
tables. 

Each ruie in the entries is identified 
by a ruie number. The condition stub 
describes a condition in a way that may 
be answered either yes or no (in one kind 
of table) or with a specific value. The 
condition entry provides the means of 
completing the condition statement. The 
action stub describes the action(s) to be 
taken, while the action entry provides 
the means of showing completion of the 
actions. 



Decision tables are generally classified 
by the type of information recorded in the 
entries. There are three types generally 
accepted : 

® Limited Entry: This is the most 
widely used and, because of its sim- 
ilarity to binary logic, is most suited 
for computer oriented applications. 
Condition entries are limited to a 
Y, N or - (meaning not applicable). 
Action entries are limited to Xs. In 
order to accomplish this, the con- 
dition stub must be written so that a 
true-false condition exists, and the 
action stub must describe the com- 
plete action to be taken. The example 
in this article will be of this type. 
® Extended Entry: In this type of deci- 
sion table, the entry portion is merely 
an extension of the stub portion. The 
stub describes the variable and the 
entry describes the possible values 
which the variable can assume. This 
type of table is quite well-suited for 
those situations in which only a few 
variables occur, except that those 
few variables may assume many differ- 
ent values. Figure 3 is an example of 
this type of tabie. 



Note: For those readers 
who would like to learn 
more about decision tables, 
I recommend the following 
books: 

Automatic Data Process- 
ing: Principles and Pro- 
cedures by E Awad and 
DPMA 

Decision Tables and Their 
Practical Application in 
Data Processing by 
Thomas Gildersleeve. 

Both of these books are 
published by Prentice-Hail, 
1970. I would also be 
happy to answer any ques- 
tions raised by my article. 



Closed Table Example #1 


1 


2 


3 


condition 








condition 








action 








action 








DO 2 


X 






action 








DO 3 




X 




action 









Figure 5: Examples of 
open and dosed decision 
tabies. An open tabie has 
as its iast action in each 
ruie a branch to the next 
tabie in the series, dosed 
tabies return controi to 
the tabies that cail them 
upon completion of their 
routines. 



Closed Table Example #2 


1 


2 


condition 






condition 






action 






action 






RETURN 


X 


X 



Closed Table Example #3 


1 


2 


3 


condition 








condition 








condition 








action 








action 








EXIT 


X 


X 


X 



Open Table Example #1 




1 


2 


3 


condition 








condition 








action 








action 








action 








GO TO 2 


X 


X 


X 



Open Table Example #2 




1 


2 


condition 






condition 






action 






action 
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• Mixed Entry: As the name implies, 
this type of decision table has rows 
which contain either limited or ex- 
tended entries. See figure 4 for an 
example. 
As mentioned above, complex situations 
may require more than one table, or you 
may place different types of decisions in 
different tables. Obviously, there must be 
a way for one table to reference another 
and indeed there is. How? Simply by the 
type of table you build. An open table 
has as its last action in each rule a branch 
to the next table in the series. This transfer 
is a permanent one and is accomplished by 
an action stub of GO TO n. A closed table, 
on the other hand, uses an action stub of 
DO n or PERFORM n with the idea that, 
after the called table is completed, control 
will return to the calling table and the 
indicated actions from that point on will 
continue. Return from the called table is 
through an EXIT or RETURN action entry. 
Figure 5 gives examples of both open and 
closed decision tables. 

How to Construct a Decision Table 

The first step in constructing an effective 
decision table is to state the problem in a 
clear and concise manner. For example, 
suppose we wish to construct a table for the 
following hypothetical situation: 

Your firm, which manufactures frldgets 
for home computers, often sells on credit. 
If a customer places an order which 
exceeds his/her pre viously established 
limit, the order should be forwarded to 
the credit manager for approval prior to 
filling and shipping It. However, If the 
customer has purchased more than $600 
In the past six months, he/she is consid- 
ered a regular customer, and in such 
cases tentative approval is assumed and 
the order is filled but not shipped until 
credit appoval is received. There is also a 
minimum order value of $100 from all 
customers and all orders less than this 
amount must be returned unfilled unless 
the order is from a regular customer in 
which case it may be filled and shipped. 
Ail orders over $500 in value receive a 
10% discount and all orders over $750 
receive an additional 5% discount How- 
ever, the discounts apply only for regular 
customers as defined above. 

By stating the situation as we have, we 
have completed the first step in our decision 
table construction (I realize that 1 said the 
statement should be clear and concise, but 
we have to have something to work with!). 

The second step in our construction 
process is to isolate and list both the condi- 



tions which will affect our eventual decision 
and the possible actions we may take: 



Conditions 

Regular customer 
Order exceeds credit limit 
Order less than minimum 
Order less than $500 
Order is over $500, 

less than $750 
Order is over $750 



Actions 

Request credit approval 
Fit! the order 
Ship the order 
Reject the order 
Give 10% discount 
Give 15% discount 
No discount 



At this point we should stop and examine 
our lists for correctness and add any items 
which have been omitted. In our example, 
the last three items in each list are redundant: 
obviously, a single order cannot possibly 
require all three checks nor is it necessary to 
keep all three actions. It would be much 
simpler to check each order for "over $500" 
and "over $750," assuming that the only 
possible other condition will be "under 
$500." Similarly, instead of listing ail three 
discount possibilities, why not list "give 
10%" and "give an additional 5%" - this 
covers all possibilities. After our examina- 
tion and the elimination of these redundant 
conditions, we have the following revised 
lists [Note: There is no implied relationship 
between Conditions and Actions at this 
point]: 



Conditions 



Actions 



Regular customer Request credit approval 

Order exceeds credit limit Reject the order 
Order less than minimum Give 10% discount 
Order over $500 Give additional 5% 

Order over $750 discount 

Fill the order 
Ship the order 



The next step is to place these conditions 
and actions into a formal table structure. 
A general rule to follow when constructing 
the actual table is to list the actions in the 
order in which they are to be performed. 
Further, a condition entry is left blank 
(not applicable) only if the condition is 
either not possible or is overshadowed by 
other conditions also present. Our table, 
in skeleton form, appears as in figure 6. 

After we have filled out the condition 
and action stubs of our table, we must 
complete the entry portion by filling in the 
rules. This is accomplished by returning to 
the original problem statement and carefully 
marking the condition entries and the asso- 
ciated action entries. This is shown in 
figure 7a, 

The final step in building our decision 
table is to insure completeness and eliminate 
both redundancy and contradiction. Con- 
tradiction is best eliminated by careful 
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Sample Table 




1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Regular customer 






















Order exceeds credit limit 






















Credit approval received 






















Order less than minimum amount 






















Order > $500, less than S750 






















Order > S750 






















Request credit approval 






















Give 10% discount 






















Give additional 5% discount 






















Fill the order 






















Ship the order 






















Reject the order 






















Investigate error 























Figure 6: A preliminary decision table based on the example in text 





Sample Table 




1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


Regular customer 


Y 


Y 


Y 


Y 


Y 


N 


N 


N 


N 


E 

L 
S 

E 


Order exceeds credit limit 


Y 


Y 


N 


N 


N 


Y 


Y 


N 


N 


Credit approval received 


N 


Y 


- 


- 


- 


Y 


N 


- 


- 


Order (ess than minimum amount 


- 


- 


Y 


N 


N 


- 


- 


Y 


N 


Order > $500, less than $750 


- 


- 


- 


Y 


N 


- 


- 


- 


- 


Order > $750 


- 


- 


- 


- 


Y 


- 


- 


- 


- 


Request credit approval 


X 


- 


- 


- 


- 


- 


X 


- 


- 


- 


Give 10% discount 


- 


- 


- 


X 


X 


- 


- 


- 


- 


- 


Give additional 5% discount 


- 


- 


- 


- 


X 


- 


- 


- 


- 


- 


Fill the order 


X 


X 


X 


X 


X 


X 


- 


- 


X 


- 


Ship the order 


- 


X 


X 


X 


X 


X 


- 


- 


X 


- 


Reject the order 


- 


- 


- 


- 


- 


- 


- 


X 


- 


- 


Investigate error 




















X 



Figure la: A skeleton decision table developed from the preliminary table in 
figure 6. 





Corrected Table 




1 


(2 & 6) 


3 


4 


5 


7 


8 


9 


10 


Regular customer 


Y 


- 


Y 


Y 


Y 


N 


N 


N 


E 

L 
S 
E 


Order exceeds credit limit 


Y 


Y 


N 


N 


N 


Y 


N 


N 


Credit approval received 


N 


Y 


- 


- 


- 


N 


- 


- 


Order less than minimum amount 


- 


- 


Y 


N 


N 


- 


Y 


N 


Order > $500, less than $750 


- 


- 


- 


Y 


N 


- 


- 


- 


Order > $750 


- 


- 


- 


- 


Y 


- 


- 


- 


Request credit approval 


X 


- 


- 


- 


- 


X 


- 


- 


- 


Give 10% discount 


- 


- 


- 


X 


X 


- 


- 


- 


- 


Give additional 5% discount 


- 


- 


- 


- 


X 


- 


- 


- 


- 


Fill the order 


X 


X 


X 


X 


X 


- 


- 


X 


- 


Ship the order : 


- 


X 


X 


X 


X 


- 


- 


X 


- 


Reject the order 


- 


- 


- 


■ 


- 


- 


X 


- 


- 


Investigate error 


- 


- 


- 


- 


- 


- 


- 


- 


X 



Figure 7b: The final corrected decision table for the example In text. 
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examination of the problem statement to 
insure that the conditions and actions we 
entered into the table earlier do not con* 
tradict each other. Insuring completeness 
is fairly simple if we understand the "else 
rule." Put simply, this rule says that, if none 
of the other rules listed hold, we also have a 
specific action to take. In the case of our 
table in figure 7a, the "else rule" says we are 
to investigate the error condition. 

Redundancy 

Eliminating redundancy is a bit more 
complicated. There are various rules and 
methods for doing this, and we will discuss 
only one of them. Certainly this is not the 
only "right'* method. Also, keep in mind that 
throughout the following discussion we are 
dealing only with two rules which have the 
same indicated actions. 

The first law for eliminating redundance 
says: 

If, with the exception of one condi- 
tion, two rules have the same condition 
entries and, for that one condition, one 
rule has a V entry and the other an N 
entry, then the two rules can be com- 
bined into one rule with the entry for 
that condition becoming indifferent (not 
applicable). 

Let's apply this law to our table in fig* 
ure 7a. Note that rules 2 and 6 seem to fit 
the criteria: they both have the same action 
entries and the same condition entries, 
making them candidates for elimination. We 
can thus combine these two rules with the 
result shown in figure 7b. Note that rules 3 
and 9 almost fit our criteria for possible 
elimination: the only difference is that 
there are two conditions with different 
entries, and we are allowed only one by our 
rule. No other rule pairs fit the criteria and, 
after combining the two rules as in figure 7b, 
we may safely assume that our table passes 
this first law of redundancy elimination 
processing. 

The next test to apply can be stated as 
follows: 

Each pair of rules remaining after 
application of the test above must have 
at least one condition for which one rule 
has a Y entry and the other an N entry. 

Those pairs of rules which meet this test 
are said to be independent of each other, 
while those which fail this test are said to 
be dependent on each other. Dependency 
at this point in our tests indicates that the 
table still contains either redundancy (it 
has a dependent rule pair with the same 
actions) or contradiction (there is a depen- 



dent rule pair with different actions). Let's 
examine our table. 

Pairing each rule with each of the others, 
one at a time (eg: pair 1 and 2, then 1 and 3, 
2 and 3, 3 and 4, and so on), we check the 
conditions for a Y in one rule and an N in 
the other. This isn't as time-consuming as it 
appears, since we can assume the pair is 
independent upon encountering the first 
occurrence of the Y-N condition. We can 
see, after examining all rule pairs, that none 
of them are dependent. We can therefore 
assume that our table is indeed nonredun- 
dant and that it does not contain any 
contradictions. 

Note: if we had found a dependent rule 
pair, we would have had to apply the 
following rules to eliminate the redundancy: 

h if one rule is pure and the other 
mixed, then the pure rule is contained 
in the mixed rule and the pure rule 
may be eliminated. (A pure rule is 
one in which ail entries are either Y or 
N, while a mixed rule has both Y and 
N entries.) 

2. If both rules are mixed, there is at 
feast one pure rule which is common 
to both which you can eliminate from 
one of the original rules. 

We won't go into these here, since they 
usually appear only in more complicated 
applications. I mention them simply to make 
our discussion complete. 

Once our decision table is built and we 
have completed the error checking pro- 
cedures mentioned above, we can use the 
table as a basis for either a preliminary flow- 
chart or, with the addition of the necessary 
10 routines, go directly to the coding phase 
of our programming. The path we take at 
this point depends entirely on how carefully 
we have constructed our table. 

Conclusion 

We have seen how we can go from a gen- 
eralized problem statement to a list of pos- 
sible conditions and actions to a completely 
checked out and (we hope) error-free de- 
cision table. Of course, like any other new 
procedure, you will have to use it several 
times before you become comfortable with 
the process. But no matter how difficult or 
complicated it seems, I urge you to try it 
not once but several times in actual program- 
ming situations. After doing so, I'm sure 
you'll agree that using decision tables greatly 
increases your productivity and eliminates 
the situation in which, almost at the end of 
a long program, you discover one little 
condition you forgot back at the beginning, 
which is where you end up again in short 
order!" 
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Programming Entomology 



Gary McGath 



An entomologist is a bug expert When he 
sees an insect, it isn't just a bug to him (in 
fact, he will vociferously protest that not ail 
insects are bugs); it has a particular habitat, 
lifespan, favorite food, and breeding pattern. 
Nor is his knowledge just academic; he can 
tell you how to protect yourself from a 
harmful one by killing it or keeping it away. 
The same sort of knowledge is necessary 
for programming. The skilled programmer 
knows what kinds of bugs may attack a 
program, how to track them down, and how 
to keep them from getting there in the first 
place. He knows the ways to get at particular 
bugs, as well as the general treatments which 
are effective against all of them. 

The first thing to realize about bugs 
is that they don't appear by spontaneous 
generation. They have a creator, and their 
creator is the programmer. (Throughout 
this article, I am speaking only of user 
program bugs; hardware bugs are an entirely 
different breed, subject to different laws, 
and systems software may be beyond your 
control.) No matter how outrageously the 
program is acting, it's only following orders. 
So what you have to ask about a bug in 
your program is: how did you put it there? 
What kinds of mistakes are you prone to 
make? If you caught a certain bug in one 
part of the program, might you have put 
the same kind of bug elsewhere as well? 
'Thou art God" . . .and thou must take 
care of thy creation. 

But the fact that each programmer 
creates his own bugs doesn't mean there 
aren't species of bugs found in everyone's 
programs. Knowing about these species can 
be a great timesaver, especially when the 
species can be identified by the effects. 

One of the most common bugs is the 
Clobbered Value, found where the pro- 
grammer assumes the content of a register 
or the value of a variable is the same as 
before, but it isn't. Take this attempt to 
exchange the values of two variables: 

10LETX = Y 
20LETY-X 

This fails because when statement 20 is 



executed, the value of X has already been 
clobbered by the previous statement, with 
the result that Y never gets changed at all. 

Clobbered Values are frequently found 
on subroutine exits. It's easy to write a 
harmless looking CALL or GOSUB (possibly 
to a routine you haven't written yet) and 
assume everything will remain the same. But 
strange things can happen if the subroutine 
unexpectedly changes some values. 

A not too distant relative of the Clob- 
bered Value is the Zapped Stack, found only 
in machine and assembly code. It appears 
most often by pushing items onto the pro- 
gram's stack at the start of a subroutine, 
then failing to pop them, or popping too 
many things at the end. Another way to 
invite this bug is to use the stack pointer 
for some other purpose during the course 
of a subroutine. 

Subroutines are also the habitat of the 
Botched Call. A certain protocol is needed 
to call any particular subroutine. If, when 
you write a call to a subroutine, you expect 
a value to be returned in the wrong place, 
or you assume the subroutine will do some- 
thing which it actually won't (or vice versa), 
this bug will have gained a foothold. The 
difference between a Clobbered Value and 
a Botched Call is that when you have the 
latter, the subroutine is doing the right 
thing; the calling program is just mistaken 
in its expectations. 

Another species of bug lurks in jumps, 
branches, and GOTOs. The Branch Bug 
is so difficult to fight that serious attempts 
have been made to wipe out its habitat; 
languages and programming styles (struc- 
tured programming) have been developed 
that use no jumps. The Branch Bug comes 
in two varieties: jumping to the wrong 
place, and jumping to the right place with 
inadequate preparation. The first of these 
is easy to produce in languages where 
statement labels have to be numbers (eg: 
BASIC and FORTRAN, especially BASIC, 
where every statement has to be numbered 
whether it's ever going to be a jump destina- 
tion or not). The jump with inadequate 
preparation \% svnvtax Xo to* &o\tte& Ca\\, 




Clobbered Value Bug: 
Your program changes the 
value of a variable at a 
time and place which is 
unintended. The detection 
difficulty ranges from the 
obvious (after if is found) 
to the subtle (before it is 
found). 
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Zapped Stack Bug: 

Stack oriented machines 

and software are both 

very egalitarian with 

respect to pushes 

and pops. They like 

to have the same 

number of items pushed 

as are later popped, or 

else they'll transform 

themselves from tranquil 

and placid programs into 

memory zapping monsters. 



O ' /, 



?ow^\ 





Botched Call Bug: The 

Botched Call Bug is like 

the proverbial square peg 

in a round hole: Unless 

the peg or the edge of the 

hole yields, sparks will fly. 



but it can often be harder to figure out if the 
program has a complex flow pattern. 

A few special methods are applicable 
to fighting the Branch Bug. One of these 
is program flow analysis. A look at the 
possible paths a program can take will 
often reveal some of these bugs. Is there 
a part of the program that can never be 
reached? Are there traps in the program, 
loops that can never terminate? Are there 
jumps which will result in variables being 
used without having been set to a value? 

In languages like BASIC, where every 
statement is labeled, it's helpful to set off 
statements that can be reached by jumps 
either by using special statement numbers 
or by pointing them out in comment state* 
ments. In any language, the statements 
that can be reached by jumps should be 
logical breaking points in some sense, 
places where a new unit of work begins. 
Except in desperate situations where 
economy is all-important, jumps should 
be used to satisfy the logic of the program, 
not to save a few instructions. 

If a subroutine call can be used instead 
of a jump, it probably should be used. A 
subroutine will send you back where you 
came from, so figuring out the flow of the 
program is easier. For many purposes, 
you can treat a subroutine as a unit when 
studying the program; as a single instruction 
that happens to do complicated things. 
You can't do this with the instructions 
reached by a jump. 

The next bug in our survey feeds on 
apples and oranges. More generally speaking, 
the Mismatched Unit is found where the 
units or dimensions of the quantities being 
used in a program aren't the ones actually 
needed. Take the program statement LET 
V = D * T, where D is a distance in miles, 
T ts the time traveled in hours, and V is 
intended to be the traveler's average velocity 
in miles per hour* By using simple algebra 
on the units, you can see that the result 
obtained will be units of miles times hours, 
not miles per (ie: divided by) hour. 



Bugs of this type are harder to spot when 
the mismatched variables are further apart 
in the program, but consistency will keep 
them from occurring. Simply be sure you 
know in advance what units each variable 
has to come in. 

Assembly and machine language program- 
ming allow an especially messy type of 
Mismatched Unit to show up: mismatches 
between addresses and data, or between 
absolute addresses and relative addresses 
(values to be added to a base address). To 
avoid this bug, watch out for the different 
addressing modes of different instructions. 

Another bug with a specialized habitat 
is the Fencepost Bug, named for its ten* 
dency to rest in problems like this one: 
"If you are putting up a wire fence 100 
feet long, supported by posts every 10 feet, 
how many posts do you need?" Another 
name for this bug is the Boundary Condition 
Bug; it's always found in connection with 
the start or end of some sequence, where 
special treatment is needed. One form 
manifests itself in confusion over whether 
the first element of a group is number or 
number 1. Another is found in the attempt 
to relate each element of an array to the 
next, as in this statement: 

IFT(l)<T{M)GOTO100 

Try this one setting I equal to the dimension 
ofT. 

Finally, we come to the most insidious 
of all bugs, the Timing Bug. The character- 
istic that makes this bug so fearsome is that 
a program infested by one may run correctly 
once but not the next time; it may even run 
correctly 99 times but fail on the hundredth, 
using exactly the same data each time. To 
make matters worse, running programs in 
single step mode will usually drive Timing 
Bugs into undetectable hiding. 

As the name suggests, the Timing Bug is 
one that shows up depending on the order 
in which asynchronous events (events that 
have an unpredictable relationship in time) 



68 




Mismatched Unit Bug: A 

result of inadequate 

analysis of a calculation, 

the Mismatched Unit Bug 

results in strange elixirs. 

When both apples and 

oranges are thrown into 

the analytical engine, 

what is the nature of the 

juice which flows out? 



occur. Systems that have interrupt facilities 
are especially prone to being attacked by 
Timing Bugs, since an interrupt routine may 
be executed at a different point in the pro- 
gram each time it's run. An interrupt routine 
may, for instance, set up certain variables 
to be used by the main program. If another 
interrupt of the same kind can occur before 
the variables have been processed by the 
main program, and if that interrupt changes 
those variables, unpredictable results can 
occur. Yet most of the time, interrupts 
may not occur that close together, so the 
bad result is said to be nonrepeatable. This 
means that repeated runs of the program 
can't be used to systematically close in on 
the bug. 



A Timing Bug can also live on direct 
memory access (DMA). Some mass 
storage devices can read or write data in 
bulk without the intervention of the 
processor, using those memory access 
cycles which the processor doesn't use. 
The length of time a DMA transfer will 
take is, at best, very difficult to predict; 
so a Timing Bug can strike if memory 
which is accessed by DMA can be accessed 
or modified by the processor. 

Since Timing Bugs are so hard to hunt 
down, extra efforts should be made to avoid 
giving them a foothold. Be extra careful in 
writing interrupt handlers or DMA com- 
mands. Watch for places where interrupts 
need to be disabled. As for the indentifica- 
tion of Timing Bugs, the following rule is 
useful: if you can prove, in a precise instruc- 
tion by instruction study, that what 
happened couldn't possibly have happened 
from the execution of those instructions, 
suspect a Timing Bug; something else was 
happening during the execution of those 
instructions. 

Incidentally, it's possible to encounter 
bugs much like Timing Bugs even without 
interrupts or DMA. An input or output 
device, such as a keyboard, is asynchronous 
with the program; the exact behavior of the 
program will depend on the behavior of 
these devices. For instance, a program 
which accepts keyboard input and accu- 
mulates it in a buffer may work fine for 
you, yet a faster typist may make it fail 
because no provision was made for the 
chance of exceeding the buffer's capacity. 
But in a situation like this, it's at least 
possible to look at every call to an input 
routine and tell what its effects might be. 

This completes our survey of important 
species of bugs (I have nothing useful to say 
about the Common Typo, though it does 
have to be fought). Others will no doubt 
discover voracious breeds which I have 
overlooked, and perhaps they will improve 
on some of the classifications I have men- 
tioned. But knowing about the species 
which are listed here will hopefully be 
a help in identifying and killing the bugs 
in your own programs. 




Branch Bug: Jumping 
blindly about in memory, 
the Branch Bug is always 
on a collision course with 
valid execution of a 
program. 



The Timing Bug: This 
most subtle of all bugs 
spends most of its time 
relaxing, and suddenly 
taking a swipe at appar- 
ently random times. 
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This doesn't mean that classifying 
bugs is all there is to entomology, neither 
the biological kind nor the kind being 
discussed here. Entomology wouldn't 
be a science if it couldn't say things that 
are true of all bugs, regardless of species. 
What I have discussed so far is differentia- 
tion; but integration is equally important. 

The basic fact that unifies all bugs is the 
one which i mentioned at the beginning of 
this article: they're all creations of the pro- 
grammer. And this fact allows the use of a 
broad-spectrum killer against all bugs: 
DDT, standing for Design, Documentation, 
and Testing. Let's take them in order: 

• Design. The best way to stay bug-free 
is to write programs without bugs. This may 
sound like superfluous advice, but pro- 
grammers (myself included) are often 
tempted into writing programs quickly, 
rather than writing them well. The attempt 
usually fails, since such programs will 
usually cost more in debugging time than 
the time saved in writing them. 

An error born of pragmatism is to 
suppose that it doesn't matter how you 
design a program, as long as it works. 
There are two problems with this idea. 
The first is that if you use any method 
that appears to do the job, without 
regard for well organized design, it will 
be a lot harder to ever make the program 
work. The second problem is that even if 
the program works for its immediate pur- 
pose, it will be harder to make changes to 
meet new needs, since a particular ad hoc 
solution may not be generalizable. 

The first step in designing a program is 
to lay out a complete plan of attack before 
writing it. Decide what data structures you 
will need, and what method you will use. 
Data structures are often the key to the 
whole program. First plan the program in a 
few large steps; then decide what each step 
will consist of in more specific terms; then 
repeat the procedure until you're down to 
the level of your chosen programming lan- 
guage. This is the principle of structured pro- 
gramming, and also of mental unit-economy: 
avoid having to think about more things at 
once than your mind can handle. If you can 
keep everything relevant to a particular 
operation in your head, you're not likely to 
put bugs into its implementation. 

Flowcharting is often recommended for 
program design, but it's cumbersome and 
doesn't lend itself to representing a hierar- 
chical design. Another approach is to use a 
well designed programming language, such as 
ALGOL or APL, to write the design. Since 
you aren't actually going to run the program 
in that language, you can assume any fea- 
tures that would make the job easier. The 



point of this is to have a representation of 
the program that you can understand with- 
out strain, so that you don't lose sight of 
your overall plan while chasing down details 
of implementation. If you do have bugs 
after doing this, at least they won't be part 
of the whole design of the program. 

© Documentation. The main reason for 
writing up the way a program works isn't to 
explain it to someone else; it's to make sure 
you understand it yourself. Documentation 
shouldn't be an afterthought; it should begin 
with the design of the program (when you 
write what it is going to do), and continue 
with comments written along with the 
instructions. 

Good documentation isn't found in sheer 
number of comments (though there should 
be a lot); it's found in comments that ex- 
plain the operation of the program. Com- 
ments are especially needed for data, sub- 
routines, and points reachable by jumps. 
Variables and constants should be explained 
so that the reader will see how they can be 
used; this allows us to spot threats to 
them, such as Mismatched Units and Clob- 
bered Values. If the language allows, give 
constants names rather than using their 
numeric values throughout the program; 
this makes updating easier and renders 
the Common Typo's attacks more con- 
spicuous. Subroutines should be prefaced 
with a description of how they are called, 
what inputs are needed, what values are 
returned, and what information may be 
destroyed in the process. Jump points 
should have an explanation of the con- 
ditions under which they are reached. 

To make a program at least partly self- 
documenting, the name of a routine or 
variable should indicate its use. One of the 
major weaknesses of BASIC is that it doesn't 
allow this to be done very much; this is a 
reason for having a lot of comment state- 
ments to explain what BASIC variables 
and subroutines are used for. 

Just as a sample, here's a preface to a 
hypothetical 8080 assembly language sub- 
routine (see box). The comments explicitly 
define linkage conventions. 

The protection provided against Botched 
Calls should be obvious. 

® Testing. If you follow the approach 
outlined so far, you'll have a better chance 
of getting your program to work, but you 
may still have planted a few bugs inadver- 
tently. So you have to test the program 
before declaring it bug-free. Testing 
should begin with a simple version of the 
program, if possible; but it should begin 
only after the program has been written 
with enough care so that there's a chance 
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of not finding any bugs. 

Use whatever debugging tools are avail- 
able. High-level languages will usually pro- 
vide useful information when the program 
goes wrong. Versions of BASIC that allow 
single statements to be executed make it 
possible to find something about the 
conditions under which an error occurred. 

When working in machine language, a 
debugging program will ease discovery of 
bugs. Such a program allows the user to 
put breakpoints into the program being 
tested (returning control to the debugger 
when the program counter reaches a certain 
address) and to examine and modify regis- 
ters and memory. These programs range 
from simple 1 K monitors to powerful 
symbolic debuggers like Digital Equipment 
Corporation's DDT (Dynamic Debugging 
Tool, no relation to the name as used here). 
Having one of these in ROM can be a 
tremendous help. 

If the program works the first time, try it 
again with different data to make sure. 
Check out simple cases. Sometimes a pro- 
gram will work in complicated cases, but be 
bitten by the Fencepost Bug in simple ones. 
Check out more complicated cases. If 
possible, use a random number table as a 
source of test data, along with handpicked 
cases. 

If the program doesn't work the first 
time, try it again with different data. Aim 
for the simplest case possible. If you cah 
get the program to do something right, 
that will cut down the number of places 
where bugs may be lurking. 

When a program is being tested, the work 



is easiest if execution comes to a screeching 
halt as soon as something goes wrong. A 
program may be able to run a while after 
crucial damage has occurred, only to 
clobber all of memory before stopping. 
If this happens, it can be almost impossible 
to localize the source of the disaster. But 
if the program makes periodic checks for 
error conditions (such as impossible values 
or invalid relationships) and reports them, 
there's a better chance of discovering just 
where things went wrong. For instance, 
a routine that fills a block of memory 
between two addresses might check to make 
sure that the low address is really lower 
than the high address. Redundant tests 
may slow down the program, but they 
can be taken out when all the bugs are 
known to be dead. 

The overriding consideration to remem- 
ber in the use of this Design, Document and 
Test technique is that it's open-ended. It 
will, in principle, kill any kind of bug; but 
a new approach to design, a better scheme 
of documentation, or a novel test may be 
needed for subtle species. Approaching 
bugs scientifically means thinking about 
them. It means recognizing that any bug 
will have important similarities to pre- 
viously encountered bugs; and that it may 
have equally important differences. So 
when you find yourself struggling to dis- 
cover what's wrong with a program whose 
behavior is incomprehensible, you can 
console yourself with the thought that you 
may be about to make an exciting entomo- 
logical discovery that you can use repeat- 
edly.® 



COMPUTE PROBABILITY OF WIDGET BREAKAGE 

INPUT - MASS OF WIDGET (GRAMS) IN REGISTER PAIR BC 

AGE OF WIDGET (DAYS) IN REGISTER PAIR DE 
OUTPUT - PROBABILITY OF BREAKAGE (PERCENT) IN REGISTER PAIR BC 
ALL OTHER REGISTERS ARE CLOBBERED 
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PROGRAM DETAILS 
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About This Section 



This section deals mainly with one of the more difficult aspects of a program's structure 
tables. For any but the most elementary applications the programmer finds that he (she) needs 
to construct some kind of table for a variety of purposes: branching, symbols, data. In fact, 
note that virtually any file of data can ultimately be thought of as a table. This section should 
answer many of your questions about a variety of tables. 

The second topic covered in this section is how to create and maintain binary trees. This 
subject has a reputation which scares a lot of people from using trees. But when working with 
large amounts of unsorted data, many times the fastest way to reference any particular piece of 
it is by arranging it using a binary tree approach. Now there is no longer anything to fear about 
binary trees. 
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An Introduction to Tables 



F James Butterfield 



The construction and use of program 
tables is the gateway to developing powerful 
programs. The new programmer may have 
trouble getting to know the concept of 
tables, but time spent learning about tables 
is well worth the effort. 

The first few programs to go into your 
home computer are likely to be written 
using a multitude of IF tests: If a value 
equals 1, branch to a particular routine; if 
equal to 2, another branch; if over 5, yet 
another branch; and so on. After a while 
this gets to be a lot of work. Programmers 
quickly learn to use table structures to 
simplify decision making. 

Tables are called by many names, de- 
pending on the language and the application: 
arrays, vectors and matrices, to name three. 
Even the concept of a "file" is usually just a 
large table which follows the same structural 
rules but is stored on disk or tape. 

Table Elements 

Most of the tables we meet in books, 
forms and so on consist of data arranged 
in rows and columns. Each row usually 
contains a record about something. Name, 
address, age, phone number might be the 
record of a schoolmate. Each item of this 
record, such as name, is called a field. In 
most cases, each record contains th^s same 
number of fields; this is called a rectangular 
table because of its appearance when 
printed, and is by far the easiest type to 
handle. 

Rows and columns can be interchanged, 
of course, by laying the table on its side. 
Let's look at two ways to encode this small 
table: 



Name 


Age 


Phone 


Joe 

John 
Pete 


14 
18 
17 


515-3838 
216-3001 
414-3377 



field 1 


Joe 


field 2 


John 


field 3 


Pete 


field 1 


14 


field 2 


18 


... etc 





First we could encode each line this way: 



record 1 field 1 Joe 
field 2 14 
field 3 5153838 

This is the most common, and usually the 
handiest way to set up the table. It's logical, 
easy to change or to add new items, and not 
difficult to program a search routine for. All 
the data for a particular line of the original 
table is in one record. However, during this 
search, we must leap 12 bytes or so each 
time we wish to examine a new record. This 
may or may not be convenient to do, de- 
pending on hardware characteristics. By 
laying the table on its side, we could write: 

record 1 



record 2 



This method is in some ways like de- 
voting a separate table to each kind of data 
in the big table: a table of names, a table of 
ages, etc. This type of organization might 
make it a little easier to search for a name, 
but it becomes tougher to add a new name 
to the list, and harder to read. But either 
way works. 

Order of Items 

One of the most important decisions you 
must make in designing a table is how to 
order the records. For small tables it 
doesn't matter very much. But as tables get 
bigger, it becomes important not to waste 
time on lengthy searches. 

At first glance, the simple answer is to 
put the most often used items at the top of 
the table where they'll be found first, a pro- 
cedure which frequently works well. But 
you must know roughly how often each 
table item is likely to be used. If the usage 
pattern changes, your table lookup becomes 
inefficient. Beware of elaborate schemes to 
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rearrange the table order as usage changes: 
they can quickly use up more time than they 
save. 

An excellent method for ordering tables 
is to use the table address itself as the item 
to be matched. Let's clarify this with an 
example. Suppose we have a character in 
Baudot (5 level) code that we want to trans- 
late, say, to ASCII. The lowest value possible 
is blank, or 00000 (decimal zero). The 
highest value is the letters shift, or binary 
11111 (decimal 31). If we add this char- 
acter, as a binary number, to the table base 
address, well create an address ranging from 
TABLE+0 to TABLE+31. In each of these 
table locations, the corresponding ASCII 
character will be stored. We'd have to make 
provision for both upper case and lower case 
Baudot, of course. The important thing 
about; this kind of table is that we never have 
to search it. We go straight to the address 
we want. 

The most common way of ordering items 
in a table is sequential, ie: in ascending or 
descending order, alphabetically or numeri- 
cally. Usually we must pick one particular 
field for the sequence, the one we expect 
to search most often. 

We get many advantages when we have a 
sequential table. The program can detect 
right away if it has "gone past" the item it's 
looking for, so that it won't waste time 
searching through the rest of the records. 
With a little more programming effort, we 
can write a binary search program that 
passes through a table very quickly. The bi- 
nary search routine works by examining the 
middle of the table and deciding if the de- 
sired item is above or below this point. From 
then on, the program concentrates exclu- 
sively on the remaining half of the table, and 
looks at its midpoint in the same way. Each 
step cuts the remaining portion of the table 
in half; eventually the desired location is 
found or a conclusion of "no match" results. 

A sequential table is the only type that 
can be used for a continuous value calcula- 
tion. You may recognize the following par- 
tial table- 



Income 


Tax 


less than 2350 





less than 2375 


2 


less than 2400 


5 



This table associates a continuous value, 
income, with unique tax amounts. If your 
income was $2378.54 you do not escape tax 
because there Isn't an exact value of 
$2378.54 in the table. For your program to 



find such an intermediate value, the table 
must be sequential. 

There are several drawbacks to sequential 

tables. The first is the problem of getting the 
table in sequential order and keeping it that 
way during deletions and additions. The 
second is that only one field is in sequence. 
This means that the user may have to re-sort 
the whole table to start searching on a new 
field. 



Advanced Techniques 

When it is desired to arrange a table in 
some order, there may be some difficulty 
moving the items around, especially if they 
are large and clumsy. 

One way to get around this is to leave the 
data in its original order and build a separate 
table called an index which gives the order 
in which the data should be read. This way, 
instead of moving the data around, the index 
is simply changed as necessary. 

Another way to achieve a similar effect is 
by chaining. This attaches an extra field to 
each record which points to the record to be 
looked at next. The program must have a 
starting point that tells which record is to 
be examined first From then on, the pro- 
gram follows the chain to the last record. 

Indexing and chaining are both relatively 
complex, but they have one important ad- 
vantage: the same file can have two indices 
or two chains so that it is simultaneously 
sorted two different ways. This feature can 
sometimes eliminate many time-consuming 
sorts. 

Tables which are not rectangular are a 
source of difficulty. If we are recording, 
for example, names of parents and their 
children, we soon face the problem of 
some parents having only one child, while 
others have seven or more. Should we allow 
seven slots for each set of parents and waste 
precious memory? We could build a complex 
table structure to allow for a variable num- 
ber of fields (children). This is practical, of 
course, but sometimes we can eliminate the 
problem by making the table into a list of 
the children rather than the parents,, 

Another special case which is often 
encountered is the triangular table, which 
resembles a square split along the diagonal, 
with the two halves containing the same 
numbers. For example, if you calculate a 
table of mileages between cities, you don't 
need to store both the Buffalo to Denver 
and the Denver to Buffalo mileages; they 
are of course the same. But trying to store 
only half the table to save memory turns 
out to be a difficult task. You'll need a 
medium sized program to get to the right 
spot in the table. 
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Access 



Program Intercommunication 



The addressing modes of your machine 
warrant study to determine the best way 
to scan tables. If you have a hardware index 
register, that's usually the best way both in 
terms of speed and programming con- 
venience. Each microprocessor has its idio- 
syncrasies. An 8 bit index will only cover a 
table size of 256 locations. Sometimes, 
though, an index doesn't modify a full 
address, but only an 8 bit offset. In this case 
the index must hold a full address rather 
than a simple table position. How easy is the 
index to modify as you step through the 
table? An increment command that adds 
one to the index value is of limited value 
if you want to jump 12 locations at a time. 

If indexing isn't convenient for a given 
job, indirect addressing is the next best bet. 
Put the address of the start of your table 
into an indirect address location; then add to 
it as necessary until you reach the end of the 
table. 

Don't hestitate to search a table back- 
wards if it's convenient. This facilitates 
searches when using certain types of in- 
dexing. 



One program segment can communicate 
with another by means of tables. In fact, 
processors which feature a common memory 
use this technique. When working with an 
interrupt structure, the recommended pro- 
cedure is to have one program prepare a 
table of material for another to pick up. 
This becomes a good way to segment large 
projects into convenient modules. Each 
module can be separately debugged by 
preparing a set of test input tables and 
examining the output tables it produces. 
On very large jobs, this kind of segmen- 
tation is an excellent way to divide work 
among several people. Even online debugging 
becomes easier, since the tables can be 
readily viewed at any time. 

Conclusion 

Tables are a good way to arrange data 
in a compact, visible and easy to modify 
form. New programmers sometimes have 
problems getting used to designing and 
using them, but they are well worth the 
effort" 
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Hashed Symbol Table 



Hashing is the meat and potatoes of symbol table 
handling, 



John Beetem 



St is often necessary to convert alpha- 
numeric code into numeric code efficiently. 
This article describes how to do this using a 
powerful data structure called the hashed 
symbol table. Assembly language code is 
included for the 8080 microprocessor, but 
the algorithms and structures apply to any 
computer. 

A symbol table is a set of ordered pairs 
called entries. The first element of each pair 

contains a symbol (usually in ASCII) and the 
second contains the object the symbol 
represents. There are three operations which 
are applied to a symbol table: 

• Lookup (also called search): An input 
symbol, called the key, is compared to 
the symbol in an entry of the symbol 
table. When a match occurs, the object 
associated with the symbol is output. 
If no match occurs, this condition is 
indicated; 

• insert: An entry is appended to the 
symbol table; 

• Delete: An entry is removed from the 
symbol table. 

A structure to make these operations easy 
and efficient is the object of this article. 



Lookup 



Lookup, if done wrong, can be a very 
time consuming operation. The most funda- 
mental lookup structure is a simple array 
where the entries are placed sequentially in 
memory. If the number of entries is large, 
lookup is quite slow because the key must 

be compared to half of the entries on the 

average. A sorted array of entries can be 
searched by methods such as a binary search, 
which is considerably better (and much 
more complicated.) But the best method 
seems to be one called hashing. 

A hashed symbol table consists of many 
arrays of entries, called buckets (my system 
uses 64 arrays). Each element in a bucket 
has the same hash code for its symbol. A 
hash code is computed from the symbol 
itself using a pseudo random method, such 
as adding the binary representations of all 
the characters in the symbol and using the 
low order six bits of the result. Using a good 
hashing method, the symbols are well 
distributed over the buckets, and each 
bucket is fairly short. 



A Note About Notation 

The routines described 
in this article are repres- 
ented in two notations. 
Figures 7 through 5 show 
the various algorithms in 
the Warnier-Orr structured 
programming discipline. 
This notation is more fully 
described in David Higgin 's 
articles in the PROGRAM 
STRUCTURE section of 
this edition. Listings 1 
through 5 provide the 
author's corresponding 
8080 assembly language 
versions of the pro- 
grams. . .BL 



LEXCMP 
(1,n) 



BEGIN 



END 



i 9 



get key 

get character to compare 

compare characters 



equal 



fual C 

(o.i) 1 SI 

e 



set not equal flag 



equal 
(0,1) 



BEGIN 

get next character 

last character in string i set equal flag 
(0,1) L 

® r 

last character in string < SKIP 



END 






Figure 7: LEXCMP compares two ASCII strings. My format assumes that the 
last byte of each string is marked by its sign bit being set. The strings can be 
of any length, and don't have to both be the same length. Equality of strings 
is indicated by returning with the match flag set Otherwise the match flag is 
cleared. 
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To lookup a key, the following algorithm 
is used: 

® Compute the hash code; 

® Use the code to find the bucket; 

& Use a simple sequential search through 
the bucket to match the key. 

Since each bucket is short, this is an 
efficient way to perform the lookup. 

To insert or delete an entry, first hash the 
symbol to find the right bucket, then insert 
or delete the entry into or from that bucket 
This last operation is dependent on bucket 
structure, and will be discussed presently. 

Storage could be a problem. Would it 
make sense to have 64 different arrays, one 
for each bucket? No, because one bucket 
could become filled while others are empty, 
and it's silly to run out of space when there 
is plenty left. So it would be nice to store all 
the entries in the same array in memory. 
How does one indicate the bucket structure? 

Linked List 

A linked list consists of a group of things 
called nodes. Each node contains data and 
one or more pointers to other nodes. (A 
binary tree is a linked list.) This structure is 
used to solve our problem as follows: 

Each node contains a symbol table entry 
and the sixteen bit address of the next node 
in the same bucket. There is also a 128 byte 
array containing the sixteen bit addresses of 
the first node in each bucket, An address 
such that the high byte is zero indicates that 
there are no more nodes in that bucket The 



LEXCMP: 


LDAX 


B 




CMP 


M 




RNZ 






iNX 


B 




(NX 


H 




ORA 


A 




JP 


LEXCMP 




XRA 


A 




RET 





;Load A with character addressed by BC. 
;Cornpare with character addressed by HL. 
;lf not equal, return with zero flag clear. 
Advance to next character in each string. 

;lf not last character in both strings, 
;Then continue comparison, else: 
;Set zero flag and clear carry flag. 
;Return. 



Listing 1: Subroutine LEXCMP [LEXical CoMPare] compares two ASCII 
strings. The addresses of the beginning of the strings are stored in the HL and 
BC registers. The last byte of each string is marked by its sign being set The 
strings can be of any length } and don't have to be the same length. Equality 
of strings is indicated by returning with the zero flag set, otherwise the zero 
flag is dear. 



BUCKET: 


[block 


of 128 bytes, 


HASH: 


XRA 


A 




XRA 


M 




INX 


H 




JP 


HASH+1 




AN! 


3F 




MVI 


H,00 




MOV 


UA 




DAD 


H 




LXI 


D,BUCKET 




DAD 


D 



RET 



;Ciear A. 

;XOR next character in string. 

Advance to next character. 

;lf not last character, continue hashing. 

;Use low 6 bits of result as hash code. 

;Load HL with hash code. 

;Double hash code since addresses are two 

;bytes long. 

;Load DE with address of bucket pointer 

,*array. 

;HL Contains address of the pair of bytes 

containing the address of the first node 

;in the bucket. 

;Return. 



Listing 2: Subroutine HASH computes the hash code of the symbol "in" the 
HL registers by exclusive OR "tng the characters in the string. 
HASH returns the hash code in A, and the address of the address of the first 
node in the bucket in HL. 



HASH 



BEGIN 



{; 



clear register 

get first character in symbol 



offset 
(1,n) 



BEGIN 



END 



END 



perform exclusive OR operation between register and character 
get next character 



last character i SKIP 
(0,1) 

IBEGIN 



e 



last character -< 
(0,1) 



use low order 6 bits as hash code 

double hash code 

get beginning address of BUCKET 

add beginning address to hash code to get hash address 

END 



Figure 2: HASH computes the hash code of the symbol by exciusive-OR'ing 
the characters in the string. HASH returns the hash code and the address of 
the address of the first node in the bucket. 
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LOOKUP 



BEGIN 
HASH 






save key 



hash symbol to get first node in bucket 



loop 
(1,n) 



BEGIN 



get address of next node 



\ = -fse 



address of next node = -{set no match fiag 
(0,1) 



e 



address of next node = 
(0,1) 



BEGIN 



LEXCMP J compare key and symbol 



tched -{ge 
0,1) *- 



matched -(get next node 
(0/ 



e 



matched -<SKIP 
(0,1) ** 



END 



END 



END 



Figure 3: LOOKUP searches for the key symbol in the symbol table. When 
the symbol is matched, the address of the parameters represented by the 
symbol are returned and the match fiag is cleared, if the symbol cannot be 
found, LOOKUP returns with the match flag set 



LOOKUP: 


MOV 


B,H 




MOV 


C,L 




CALL 


HASH 


LOOP: 


MOV 


E,M 




INX 


H 




MOV 


D,M 




MOV 


A,D 




ORA 


E 




STC 






RZ 






MOV 


H,D 




MOV 


t-,E 




INX 


H 




INX 


H 




PUSH 


B 




CALL 


LEXCMP 




POP 


B 




R2 





XCHG 
JMP 



LOOP 



;Save address of key in BC. 

;hcish symbol: HL contains address of the 
jaddress of the first node in the bucket. 
;Load DE with address of next node in list. 



Load A with high byte of address 
If DE is zero (symbol not matched) 
,then (set carry. 
Return,) 
Load HL with address of node. 

•Span {pass over) 2 byte field containing 

jaddress of next node in list. 

;Save address of key. 

jCompare key to symbol in node. 

;Restore address of key into BC. 

;Return if successful match. (Carry cleared 

;by LEXCMP) 

;Load HL with address of address of next 

;node in list. 

;Continue search through bucket. 



Listing 3: Subroutine LOOKUP searches for the symbol "in" the HL registers 
(the key) in the symbol table. On the symbol's first match t the address of the 
parameters represented by the symbol are returned in HL with carry clear. If 
the symbol cannot be found, LOOKUP returns with carry set. 



logical structure of memory (as the program 
sees it) is different from the physical struc- 
ture of memory. The buckets are stored in 
the same region of memory and there is no 
"crosstalk" between buckets. 

Symbols consist of a variable length array 
of bytes containing 7 bit ASCII characters. 
The last character in the symbol is indicated 
by the sign bit being set, whereas the other 
characters have the sign bit clear. This is 
necessary so that both ends of the symbol 
are known. 

A complete node consists of: 

• 16 bit address of the next node 
byte 1 contains the low 8 bits, 
byte 2 contains the high 8 bits 

© n bytes of symbol in ASCII. 

• m bytes of parameters represented 
by the symbol. 

We are now ready to look at some code. 

The Routines 

Subroutine LEXCMP (figure 1) is used to 
compare two ASCII character strings. The 
strings need not be of equal length. 

LEXCMP works as follows: the first 
two characters are compared. If they are not 
equal, LEXCMP returns with a not equal flag 
set. If the first two characters are equal, 
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LEXCMP checks if those were the last 
characters in the strings. If so, LEXCMP 
returns with the not equal flag set; other- 
wise, the next two characters are compared, 
and so on. 

Subroutine HASH (figure 2) computes 
the hash code of a symbol. HASH then 
computes the address of the pointer to the 
correct bucket 

HASH uses an exclusive-OR function to 
hash the characters. This makes it very easy 
to detect the end of the symbol, as only the 
last character will set the sign bit in the A 
register. BUCKET is the starting address of a 
128 byte array containing the addresses of 
the first node in each bucket. 

Subroutine LOOKUP (figure 3) searches 
for a key. If the key cannot be found in the 
table, LOOKUP returns with a not found 
flag set; otherwise, LOOKUP returns the 
address of the parameters associated with 
the symbol, and clears the not found flag. 

Subroutine INSERT, shown in figure 4, 
inserts a node into the symbol table. This is 
very easy to do using the linked structure. 
Only two addresses must be moved. The first 
node in the bucket is linked to the new 
node, and the address in the BUCKET array 
links to the new node; thus, the new node 
becomes the first node in the bucket. 

Subroutine DELETE, figure 5, removes a 
node from the symbol table. DELETE 
requires that the node to be deleted was the 
last node inserted into the bucket This is 
not as severe a limitation as one might think. 
(In a compiler such as ALGOL, symbols go 
out of existence in reverse of the order they 
came into existence; exactly what goes on 
here.) This limitation simplifies the DELETE 
operation considerably, and also simplifies 
reclamation of space (reusing memory freed 
by deleting nodes.) 

DELETE moves the address in the link 
field of the first node in the bucket into the 
proper element of the BUCKET array. Thus 
the second node in the bucket becomes the 
first node. Notice that the INSERT and 
DELETE operations are exactly PUSH and 
POP operations, where the stack is organized 
as a linked list instead of a contiguous array. 

This set of routines could be the basis for 
any symbolic data handling program. The 
structures are not limited to ASCII and 
could be used for any code, for example, a 
phonetic language. This system is the sym- 
bolic backbone of a compiler or interpreter: 
LISP could be kept very happy. Notice that 
you can have a symbol defined many times: 
The most recent assignment is valid (it has 
"extent"), yet the older ones still exist (they 
have "scope"). This is caused by this inser- 
tion and deletion method, and is the basis 
for block structured languages. 



INSERTS 



BEGIN -/get symbol 

HASH 4 hash symbol to get first node in bucket 
get address of new node's iink field 
get address of first node in bucket 
make new node first node in bucket 

END 



Figure 4: INSERT inserts a node into the symbol table. The new node 
becomes the first node in the bucket 



;save address of new node. 

;break (move up to) to symbol. 

jspan iink bytes. 

;hash symbol: HL contains address of address 

;of first node in bucket. 

;Load D with address of new node's iink field. 

;Get address of first node in bucket. 

;Set link of new node to point to first node in 

;bucket. 

;Make new node first node in bucket. 

;Do above to second bytes of addresses. 



;Return. 

Listing 4: Subroutine INSERT inserts the node addressed by HL into the 
symbol table. The new node becomes the first node in the bucket 



BEGIN-lget address of second node 

HASH /hash symbol to get first node in bucket 



INSERT: 


PUSH 


H 




INX 


H 




INX 


H 




CALL 


HASH 




POP 


D 




MOV 


A,M 




STAX 


D 




MOV 


M,E 




INX 


H 




MOV 


A,M 




MOV 


M,D 




INX 


D 




STAX 


D 




RET 





DELETE 



make second node first node 



END 



Figure 5: DELETE removes a node from the symbol table. This node must be 
the first node in the bucket The second node in the bucket becomes the first 
node. 



DELETE: 



MOV 


C,M 


INX 


H 


MOV 


B,M 


INX 


H 


CALL 


HASH 


MOV 


M,C 


INX 


H 


MOV 


M,B 


RET 





;Load BC with address of second node. 



;Hash symbol: HL contains address of address 

pf first node in bucket. 

;Make second node in bucket the first node in 

;bucket. 



rReturn. 



Listing 5: Subroutine DELETE removes the node whose address is in HL 
from the symbol table. This node must be the first node in the bucket The 
second node in the bucket becomes the first node. 
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Table 1 



Address 



Initial 
Data 



After 
Insert 
BAT 



After 
Insert 
CAT 



After 
Insert 
ACT 



After 
Insert 
TAB 



After 

Delete 

ACT 



After 
Insert 
TAC 



0000 
0002 

002C 
002E 

007 E 

1000 
1002 
1004 
1006 

1008 
100 A 
100C 
100E 

1010 
1012 
1014 
1016 

1018 
101A 
101C 
101E 

1020 
1022 
1024 
1026 



0000 
0000 

0000 
0000 

0000 

0000 
'B/A 
'T f 00 
1234 

0000 
'C/A 
T,00 
4688 

0000 
'A,'C 
T,00 
AF00 

0000 
T/A 
'8,00 
6878 

0000 
T/A 

'c,oo 

7897 



0000 
0000 

0000 
-1000 

0000 

►0000 
'B/A 
T,00 
1234 

0000 
'C/A 
T,00 
4688 

0000 
'A/C 
T,00 
AFOO 

0000 
T/A 
'B,00 
6878 

0000 
T/A 
'C,00 
7897 



0000 
0000 

-1008 
-1000 

0000 

"0000 
'B/A 
T,00 
1234 

-0000 
'C',A 
T,00 
4688 

0000 
'A/C 
T,00 
AFOO 

0000 
T/A 
'B,00 
6878 

0000 
T/A 
'C,00 
7897 



0000 
0000 

-1010 
-1000 

0000 

►0000 
'B/A 
T,00 
1234 

►0000 
'C',A 
T,00 
4688 



0000 
0000 



0000 
0000 



►1008 J 
'A/C 
T,00 
AFOO 

0000 
T/A 
'B,00 
6878 

0000 
T/A 
'C,00 
7897 



-1010 
1018-1 

0000 

-0000 
'B/A 
T,00 
1234 



■0000 
'C',A 
T,00 
4688 



-►1008-1 
'A/C 
T,00 
AFOO 



M000-* 
T/A 
'B,00 
6878 

0000 
T/A 
'C,00 
7897 



-1008 
1018- 

0000 

-0000 
'B/A 
T,00 
1234 



"0000 
'C',A 
T,00 
4688 

1008 
'A/C 
T,00 
AFOO 

►1000- 
T/A 
'B,00 
6878 

0000 
T/A 
'C,00 
7897 



OOOO 
0000 

-1020 
-1018 

0000 

— 0000 
'B/A 
T,00 
1234 

fOOOO 
'C',A 
T,00 
4688 



1008 
'A/C 
T,00 
AFOO 

► 1000- 
T/A 
'6,00 
6878 

►1008- 
T/A 
'COO 
7897 
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This is one method of storing a hashed symbol table. Assume that the 
symbols we will be working with are already in memory starting at hexa- 
decimal address 1000. Each entry consists of three parts: 

® the address of the symbol which follows this one in the bucket; 

$ the symbol itself; 

« the value represented by the symbol. 
For the purposes of this discussion, the symbols are assumed to be four 
characters long and followed by a two byte value. 

Column one in table 7 shows how memory is initially arranged. BUCKET 
occupies memory from hexadecimal address 0000 to 007 E. This allows 
arrangement of 64 buckets. All pointers in the symbol and hash tables are 
initialized to zero. A pointer value of zero indicates that this symbol is the 
last one in that particular bucket since that address is not defined in the 
symbol table. 



Insertion 

The first symbol that will be entered into the BUCKET table is BA T. The 
symbol BA T hashes to hexadecimal location 002 £. The pointer at address 
002 E is 0000. This means that there are no symbols in this particular bucket 
We now set this location equal to hexadecimal 1000 which is the address of 
the symbol. 

The next symbol we wish to insert is CA T This symbol hashes to hexa- 
decimal location 002C. Since this location is also equal to zero, indicating no 
symbols in the bucket, we point it to CA Tat location 1008. 

The third symbol to be inserted (ACT) hashes to the same location that 
CA T did since it contains the same letters. Since there are already symbols in 
this bucket, we search the entire bucket to make sure that this symbol is not 
already contained within the bucket. Since it is not, we will place it at the 
head of the bucket. This is done by having the first pointer In the bucket 
point to this symbol (hexadecimal address 1010). The pointer that ACT has is 
adjusted to hexadecimal address 1008 to point to CA T. CA T's pointer is still 
0000 indicating that it is the last symbol in the bucket. 



Deletion 

The particular format that we have adopted requires that any symbol that 
is to be deleted must be the first node of a bucket This implies that it was 
the last symbol added to that bucket 

Suppose we want to delete the symbol ACT. ACT is the last symbol that 
was inserted into the bucket located at hexadecimal address 002C. If the 
pointer at this location is changed to point at the second symbol in the 
bucket, ACT is effectively eliminated from the hash table. 

If this method is employed, the minimum number of pointers need to be 
changed when inserting or deleting a symbol. Insertion requires that the 
pointer at the head of the bucket point to the new symbol location, and the 
new symbol points to the node that used to be at the head of the bucket 
Deletion requires changing only the pointer at the head of the bucket from 
the first node to the second node of the bucket » 
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Figure 1: An Opcode Table Organized for 
Direct Access. Note that with this particular 
organization the first data byte of each entry 
is related to the address of the entry within 
the table, in a sorted sequence. 



TABLE: 
+4: 
+8: 
+12: 
+16: 
+20: 
+24: 
+28: 



00 


N 





P 


01 


L 


D 


R 


02 


S 


T 


R 


03 


A 


D 


D 


04 


A 


N 


D 


05 


R 


O 


T 


06 


C 


L 


R 


07 


s 


T 


P 



Ma 



ifh Tables 



Terry Dollhoff 



Hashing is a technique used to speed up 
table searching operations by making posi- 
tion in the table depend upon the data. 
Many newcomers to programming reject 
hashing as an overly complicated technique 
useful only by the designer of exotic systems 
software, but this is not the case. Any large 
program, written for fun or profit, may 
include tasks of accessing, storing, or modi- 
fying entries in a table or array. Most game 
playing programs include a number of such 
tasks. Application of hashing techniques can 
often dramatically improve the performance 
of these programs. This article will explore 
the use of hashing (sometimes called key-to- 
address transformation) as a simple but 
effective mechanism for accessing stored 
data. These techniques can be used in 
applications where the data is organized 
randomly and where each item has a unique 
key associated with it. For example, con- 



Listing 1: Typical 8080 code sequence for a linear search of a table until the 
first byte of the current table entry matches the value in the accumulator. In 
this listing, the HL register pair must be preset to the address of the table, the 
DE register pair must be set with the number of bytes per table entry, the B 
register must contain the number of entries to search (maximum 255) and the 
key value sought must be loaded in A. This is by no means the only possible 
8080 linear search strategy. 



FIND: 



CMP 


M 


Check for a match; 


RZ 




If so then exit; 


DAD 


D 


Advance to next table entry; 


DCR 


B 


Decrement count; 


JNZ 


FIND 


Continue till end; 


MP 


ERR 


Table exhausted, treat as error 



sider a table that contains computer opcode 
mnemonics and their associated value as 
used in an assembler; by using the opcode 
value as a key this table could be used to 
determine the mnemonic associated with 
any particular value. Such a table is an 
integral part of any disassembler. 

In any computer, a particular entry in a 
table can be specified by the starting address 
of the entry. Locating an item in a table 
implies that the starting address for that 
item must be determined. One possible 
method that can be used to determine the 
address, and by far the most common 
method, is to examine each item sequen- 
tially, starting with the first item, until the 
desired item is located and hence the item 
address determined. This approach is termed 
a linear search and as you can see by the 
the 8080 subroutine of listing 1, it is simple 
to code. The big disadvantage of a linear 
search is that it is costly in terms of 
processing time because, on the average, at 
least one half of the table entries must be 
examined before locating the desired item. If 
the table is moderately large and numerous 
accesses are required, then the table lookup 
processing time will constitute a significant 
part of the total processing time. 

An alternative to the linear search in- 
volves storing the information in a sorted 
fashion based upon the key. However, even 
the best known algorithms for locating data 
in a sorted table require an average of Log2N 
tests, where N is the table size. Therefore, 
a table with, iet's say, 500 entries requires an 
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Figure 2: A Hash Accessed Table, Note that 
with the hash algorithm described in the 
text, three elements of this table map into 
identical starting entries, resulting in a re- 
hash requirement indicated by the arrows 
and dotted lines. 



average of nine tests to locate an arbitrary 
item. Although this is a considerable 
improvement over the linear search, which 
would require an average of 250 tests to 
locate an item, hashing techniques require 
considerably fewer tests than either method, 
without the added burden of sorting. 

The Key 

The fundamental idea behind any hashing 
technique is that instead of searching the 
table to determine the address of a particular 
entry, an attempt is made to calculate the 
address using the key. That is, a subroutine 
is written which, when given any desired 
key, calculates the table location containing 
the item associated with that key. If this 
calculation is successful, then the desired 
item is located with a single search. 

The first step is to determine the key. 
This choice will depend upon the intended 
use of the table. In the opcode table 
mentioned earlier, the opcode value is the 
key since ail lookup requests are of the 
form: "What is the mnemonic for the 
opcode X?" On the other hand, if this same 
table were incorporated in an assembler or 
compiler, then the mnemonic would be the 
key because requests are now of the form: 
"What is the opcode value for mnemonic 
X?". In all of our examples, we will assume 
that the opcode value is the key. 

Direct Access Hash 

imagine that there are only a limited 
number of opcode values and it so happens 
that, although the value is eight bits long, 
the opcode is uniquely determined by the 
rightmost three bits. If a table, called 
TABLE, is created with eight 4 byte entries, 
and the mnemonic and value for each 
opcode is placed in the table entry whose 
address is found by multiplying the right- 
most three bits of the opcode by four and 



adding the results to the base address of the 
table, then a simple subroutine can calculate 
the precise location of any entry. That 
subroutine, shown in listing 2 for an 8080, 
simply strips off the rightmost three bits of 
the key, multiplies them by four, and adds 
in the starting address of the table as shown 
in figure 1. Entries are added to the table in 
the same manner. Tables of this type are 
called direct access and are most commonly 
used for conversions; that is, converting 
from one character code to another, from 
opcode values to mnemonics, etc. In many 
direct access tables the actual key is not even 
stored in the table since a comparison is not 
necessary to determine the proper entry. 

Open Hash 

The direct access method would ob- 
viously break down if certain opcode 
mnemonics were associated with values 
whose rightmost three bits were equal. In 
this case, where direct access is infeasible, 
the algorithm must be slightly modified. A 
subroutine is still used to calculate the 
address, but since it is no longer possible to 



Editor's Note: 

In this article, we repre- 
sent several algorithms in a 
structured pseudo code 
form appropriate to the 
discussion. These algo- 
rithms are referenced by 
numbers in brackets, as in 
[n] for algorithm n. Each 
algorithm should be 
thought of as a formal 
procedure, which in prac- 
tice would be called as a 
subroutine. 



Listing 2: Typical 8080 code sequence for direct hash with a table of eight 
entries, each entry being four bytes in length. In a direct hash approach, the 
actual data value (in this case, a number from to 7) being sought is used to 
determine the offset in the table directly. Here the calculation is made 
according to the formula: A DDR := BASE + 4 * (A & 7) where A is the value 
of the entry being sought, BASE is the starting address of the table, and 
A DDR is the effective address of the table element involved. 



FIND; 
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H,TABLE 
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HL:=Table pointer; 
Extract rightmost three bits; 
Multiply by four; 

Add the table address; 



HL:=Entry address; 
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successfully calculate the location of all 
entries, some type of searching algorithm 
must be employed to pinpoint the position 
of the entry, given the calculated position. 
The initial predicted position of the table 
item is called the hash index and the 
procedure which produces the hash index is 
called the hashing function. For the remain- 
der of our discussion, HASH is used to 
denote that subroutine and therefore 
HASH(K) denotes the hash index for a 
particular key, K. 

Before considering how the information 
is initially entered in a hash table, it may be 
useful to examine the process used to locate 
an arbitrary entry in a hash accessed table. If 
KEY is used to denote the key associated 
with the desired entry, and TABLE, a table 
consisting of N entries (each of which are B 
bytes long), then the algorithm to locate the 
entry that is associated with KEY, using 
hash techniques, is as follows: 



1. I,J:=HASH<KEY); 

2. do until (l=J-1 ) [worst case end test for search failure] ; 

3. if @(TABLE + I * 8) ~ then [element not present, search failure] ; 

4. do; call ERROR; return; end; 

5. if @(TA8LE + I * B) - KEY then 

6. return [the item has been located] ; 

7. I:=l + 1; 

8. if I = N then I := [wrap around table space limit] ; 

9. end; 

10. call ERROR [element not present, search failure] ; 



In this algorithm, specified in a structured 
pseudo code form, step 1 calculates an initial 
estimate of the location of the item associ- 
ated with KEY, the hash index. This value is 
saved in J for the worst case end test in the 
do until construct of step 2. In steps 3 and 
4, the algorithm tests for a null entry end of 
search criterion and calls an ERROR routine 
if this is detected. Return to the calling 
program follows detection and flagging of 
the search failure condition. Then the algo- 
rithm tests to see if the current entry is 
equal to KEY at step 5; if this condition is 
found, the algorithm terminates with a 
return operation at step 6. Otherwise, the 
next index is calculated at step 7, an end 
around wrap condition is tested at step 8, 
and the do loop is closed at step 9 with an 
end statement. If the loop execution ends 
through the test on line 2, step 10 is reached 
and an error condition is flagged before an 
automatic return assumed after the last line 
of such a procedure. 

Consider again the opcode table example. 
If the hash procedure is defined as HASH(K) 
= REMAINDER(K/8), then each table item 
shown in figure 2 can be located by at most 
three searches using algorithm [1 ] . 



KEY 




Figure 3: Foiding Keys. 
When it is desired to retain 
the significance of all the 
bits in a key while com- 
pressing the total number 
of bits used, folding by 
some operation such as ad- 
dition can be used. 



Defining HASH 

In choosing a hash function, you must 
attempt to define a general procedure, using 
a minimal number of simple computations, 
which produces an even distribution of hash 
indices for a random selection of possible 
keys. If we knew that all op codes were even 
numbers, then the hashing function 
HASH(K) = REMAlNDER(K/8) would not 
be efficient, because it will produce only 
even numbers. This simple example illus- 
trates that the hashing function must be 
carefully selected to suit the particular appli- 
cation, it should also be noted that it is not 
necessary for the key to be a numeric value. 
If alphanumeric or other keys are used, the 
hashing function should ignore the data type 
and simply perform numeric or logical ma- 
nipulations of the key as though it were 
numeric. 

One of the most widely utilized, and 
historically the first, hashing function has 
already been mentioned. If N is the size of 
the table (in terms of the number of entries, 
not the number of bytes) the hash index is 
the remainder of the key divided by N. More 
precisely stated, HASH(K) = REMAIN- 
DERING in a machine such as the 8080 
which lacks division capability, this function 
will be made significantly faster by re- 
stricting the length of the table to a power 
of two (ie: N = 2M). If N = 2 M , then the 
REMAINDER (K/N) also happens to be the 
rightmost M bits of K and a divide operation 
is no longer required. The remainder is 
selected by a logical AND operation. 

The remaindering function will not 
produce well distributed hash indexes if 
many of the entries end with the same bit 
sequence. This situation is frequently en- 
countered when dealing with alphanumeric 
data. Changing the table size to a prime 
number usually improves distribution, but 
now we are back to the unwanted divide 
operation for calculating the remainder. 
There are two other alternatives to this 
problem. The first is a technique called 
folding as diagrammed in figure 3. This 
method applies the remaindering algorithm 
to the bit string that is obtained by adding 
the upper half of the internal binary repre- 
sentation of the key to the lower half. This 
minor improvement minimizes the effect of 
patterns that may occur within the key. You 
should be careful what improvisations are 
made to the folding technique. For example, 
substituting a logical AND for the add 
sounds good, but will merely make matters 
worse. If in doubt, try experimenting with 
various keys by examining the effects of key 
value in a test program to grind out hash 
indices. 
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A second method for minimizing the 
effect of similar bit patterns in the key, best 
applied to tables of size 2^, is called 
squaring. This consists of selecting the center 
M bits of the number that is obtained by 
multiplying the key by itself. Since the 
middle bits of the product depend upon all 
of the bits in the key, this method generally 
produces a uniform distribution of hash 
indices. 

Since the squaring method is safest, it 
may appear that one should always use it. 
This is certainly not the case because the 
purpose of hashing is to save processing time 
and although squaring is the most general 
technique, it is unfortunately the slowest 
since it relies on a multiply operation which 
the 8080 and many other small processors 
lack. It is often acceptable to settle on a 
slightly less efficient hash function if such a 
function is substantially faster. The guideline 
for selecting the hash function is to employ 
a more complex function only in those 
specific cases where a simple function fails 
to produce an adequate distribution of hash 
indices. But remember, any hash function is 
better than a linear search. Why? A linear 
search is a hash access where HASH(K)=0 
for all values of K, therefore any distribution 
is better than none. This degeneracy is 
evident in algorithm [1 ] when the data item 
sought is not in the table, and the algorithm 
searches every location. 

Multibyte Hash 

Until now, we have tacitly assumed that 
the entire key can be contained in one byte. 
This is impractical, and the hashing concept 
is easily extended to cover those cases where 
the key occupies more than a single byte. If 
the key is continued in byte locations 
(Ki,... Kj) then a multibyte hash function, 
HASHM, can be defined in terms of 
any of the previous hash functions as 
'HASHM(KJ) = HASH(Ki+ ... +Kj). That is, 
any of the single byte hash functions are 
applied to the sum, ignore carry, of the 
bytes in the multibyte key. As you see in 
figure 4, this is similar to the folding 
technique just mentioned. 

Another possibility for a multibyte hash, 
which should be used with some degree of 
caution since it may not provide an even 
distribution, is to apply a single byte hash 
function to the last byte (or any other byte 
of your choosing) of the multibyte key. This 
eliminates the time required to add the 
words of a multibyte key. As usual, the 
programmer is faced with a time versus 
efficiency tradeoff. 

Guidelines 

In summary, the sole purpose of a hash- 



ing function is to calculate an initial table 
index for a linear search, given a specific 
key. There is no one best algorithm and the 
number of algorithms available is bounded 
only by your imagination. The general guide- 
lines to follow when designing your hashing 
function are: 

1 . Keep it simple — Remember, the goal is 
to locate an item in the minimum 
amount of time. If the perfect hash 
requires more time than a linear 
search, it is useless! 

2. Insure an even distribution; beware of 
weird bit patterns. in the key. 

3. Check out the operation of the func- 
tion prior to employing it as a hash 
function. There is often an over- 
whelming urge to give it the smoke 
test, but hash indices are used to form 
memory addresses so it may be dif- 
ficult to isolate bugs in the hash 
function after you've incorporated it 
into a table lookup procedure. Save 
yourself some time, check the table 
lookup subroutines first. 

Building the Table 

Obviously, for the hash access algorithm 
to operate smoothly, the table items must 
have been entered into the table properly. 
The relative ease with which entries can be 
made in a hashed table is an important 
advantage of hash techniques. Remember, 
even though a sorted search is reasonably 
efficient for locating an entry, the entire 
table must be sorted before any access is 
allowed. Thus, if accesses were to be inter- 
mixed with entries, the algorithm would be 
grossly inefficient due to the amount of 
resorting required. 



MULTI-BYTE KEY 




COMBINED KEY 



Figure 4: The principle of folding key elements can be extended to a 
multibyte key. The multibyte hashing scheme might be employed where a 
key is a character string field. 
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Before any entries can be made in the 
hash table, the key field of the table must be 
initialized to some flag value which is not 
encountered as a possible key. If a table 
entry contains this value, then it can be 
assumed that the entry is unoccupied. The 
most common value used to designate an 
empty table entry is the integer zero, and 
assuming this to be the case, the algorithm 
to add an item associated with KEY, to the 
table of N entries (each B bytes long) is: 

[2) 1. I.J:= HASH(KEY); 

2. do until (l=J-1 ) [worst case end test for search failure] ; 

3. if STABLE + I * B) = then 

4. do; 

5. (enter the item at (TABLE + I * B)) ; 

6. return; 

7. end; 

8. I:- 1 + 1; 

9. if I = N then I := (wrap around table space limit] ; 

10. end; 

11. call ERROR [no room left in table) ; 

Notice that the lookup algorithm [1 ] and 
the entry algorithm [2] are very similar in 
nature. The loop control is identical, and the 
only difference is in the actions taken. It is 
quite possible to make an automatic entry 
occur whenever a key is not found as 
indicated by a null key value found during a 
search. The following algorithm combines 
both operations. 

[33 1. I,J := HAShUKEY); 

2. do until (l=J-1) [worst case end test for search failure] ; 

3. if @(TABLE + I * B) = KEY then 

4. return (the item has been located J ; 

5. if @(TABLE + I * B) = then 

6. do; 

7. [enter the item at (TABLE + I * B)l ; 

8. return; 

9. end; 

10. I:- 1 + 1; 

11. if I = N then I := [end wrap around table space limit} ; 

12. end; 

13. call ERROR [if this point is reached, table is full] ; 



In addition to adding or locating entries, 
it may also be necessary to delete entries. To 
delete an item, you might think that we 
could merely locate the item and then set 
the table entry to zero, thus making it 
available for future entries, However, if that 
approach were taken, not only would the 
desired entry be deleted, but other entries 
might be made inaccessible. The reason that 
other entries would be lost is that the 
searching terminates when an unused loca- 
tion is found. As an example, setting the 
entry at (TABLE + 20) in figure 2 to zero 
would also make the entry at (TABLE + 24) 
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Figure 5: Horizontal Or- 
ganization of Tables, In 
this method of organiza- 
tion, all the bytes of a data 
entry are assigned to con- 
tiguous addresses. 
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Table I: Comparison of Table Access Methods. This table gives the results of 
an experiment with random data to test out the various methods of access. 
The tables were filled to the percentage levels indicated at the left A table 
size of 500 possible entries was used. The access methods shown are described 
in text 
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inaccessible. Therefore, an alternate scheme 
must be used to delete entries. 

The first step is to select a deleted entry 
flag that is distinguishable from the unused 
entry flag and is also not allowable as a key. 
Then, whenever an entry is to be deleted this 
new vaiue replaces the entry. The new flag 
indicates that the entry is available for 
future additions to the table but does not 
terminate a search operation. If is used to 
denote an unused entry and -1 is used to 
denote a deleted entry, then the complete 
hashing algorithm is: 



[4] 1. U ;= HASH(KEY); 

2. do until (l=J-~1) [worst case end test for search failure] ; 

3. if @{TABLE + ! * B) = KEY then 

4. do; 

5. if [entry is to be deleted] then [delete the entry] 

6. @{TABLE + I * B) :=-1; 

7. return [item has been located ] ; 

8. end; 

9. if @(TABLE + I * B) = then [this is a null entry so] 

10. do; 

1 1 . [enter the item at (TAB LE + B * I ) ] ; 

12. return; 

13. end; 

14. I :=| + 1; 

15. if I = N then I = [end wrap around table space limit] ; 

16. end; 

17. call ERROR [if this point is reached, table is full] ; 

This algorithm either locates an item or 
adds the item to the first available location. 
If an item is to be deleted it is first located 
and then the key field of the table entry 
is set to —1. 
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Figure 6: Vertical Organization of Tables. In 
this method of organization, a multibyte 
table element is treated as "n" single byte 
subtables where "n" is the number of bytes 
in each entry. Each of the *'n" subtables has 
a length (in bytes) equal to the number of 
elements in the table. 



Collisions 

A collision occurs whenever 
HASH(KEYI) = HASH(KEY2), but KEY1 
# KEY2. As discussed earlier, a good hash- 
ing function will avoid this condition, but 
the problems caused by collisions cannot be 
ignored. Note for example that the hash 
index for opcodes 04, 24 and 34 in the table 
shown in figure 2 is 4 and hence these 
entries collide. 

What happens when two entries collide? 
The only solution we've discussed thus far is 
to search the table, in a circular fashion, 
from the point of impact as in algorithms 
[1] to [4]. If, in general, a collision occurs, 
then the resulting search, good or bad, is 
called a rehash. The process mentioned 
above, namely, searching the table in a 
circular fashion from point of impact, is 
called a linear rehash, and as you might 
expect falls into the bad category. Other 
more efficient algorithms will be discussed 
later. 

If we denote the rehashing algorithm by 
REHASH, then the general hashing lookup 
algorithm may be restated in its final form: 



[5] 1. I,J :=HASH(KEY); 

2. K := 0; 

3. do until {REHASH (I, J)=J) [worst case end test for search failure] ; 

4. if @(TABLE + i * B) = KEY then [we have a match so] 

5. do; 

6. if [entry is to be deleted] then [delete the entry] 

7. @(TABLE+I *B) :=-1; 

8. return; 

9. end; 

10. if ({K=0) & [deletion or null element @(TABLE + 1*8)]) then 

11. K := I [save last available table entry index] ; 

12. if @(TABLE + I * B) = then [this is a null entry so] 

13. do; 

14. [enter the item at (TABLE + B * K), next available slot] ; 

15. return; 

16. end; 

17. I := REHASH (l,J) [REHASH results in < I < N where N is table size] ; 

18. end; 

19. call ERROR [if this point is reached, table is full] ; 

The linear rehash that we've been using 
implicitly in [4] as steps 14 and 15 is 
described as REHASH(I) = (l+1)[mod N], 
where (l+1)[mod N] means that if (1+1) is 
greater than or equal to N, then N is 
subtracted from the value (1+1). This insures 
that the table is searched in a circular 
manner. The operation X[mod N], called X 
modulo N, is used in most rehashing algo- 
rithms to limit the range. Mathematically, it 
is the remainder of X/N; but whenever we 
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use X[mod NJ it can be calculated as 
described above (ie: subtract N if X is 
greater than or equal to N). Here again we 
have avoided the use of a divide operation to 
provide a more efficient function. Note that 
step 10 includes a check which reclaims 
deleted entries, a process not included in 
algorithm [4]. 

Improved Rehash 

The problem with the simple linear re- 
hash is that the table will not fill uniformly. 
This condition is referred to as clustering 
and causes an increase in the average number 
of tests required to locate an item in the 
table. As an example, a cluster can be seen 
forming at TABLE+16, +20, and +24 in the 
table shown in figure 2. 

There are a number of nonlinear algo- 
rithms which perform the rehash function 
without causing the clustering problems 
mentioned above. Although the computer 
science literature abounds with such algo- 
rithms, a majority of them fall into one of 
three classes. An attempt has been made to 
select the simplest and best from each class 
and present them here. 

Pseudorandom Rehash 

The first class of rehashing algorithms is 
the pseudorandom rehash and is based upon 
a pseudorandom number generator. The 
pseudorandom number generator used is not 
important, but it must be of the non- 
repeating variety. That is, it must generate 
ail possible values before any previous value 
is repeated. It must also generate all of the 
integers in the range 0, ..., N where N is the 
table size. The following simple procedure 
incorporates a common random number 
generator and will perform the rehash func- 
tion for any table of size N = 2 M . The 
variable R is internal to the rehashing func- 
tion, but it must be preset to one whenever 
the function HASH is initiated (ie: step 1 of 
algorithm [5]). 

[6] REHASH (l,J): 

1. R := REMAINDER (R*5 / N*4); 

2. REHASH := (R/4 + J) [mod N] ; 

If you're seeking the most efficient imple- 
mentation of this one, the REMAIN- 
DER(R*5/N*4) is just the rightmost M+2 
bits of R*5 because N=2^ and 4*N= 
22*2M=2M+2. Furthermore, the divide 
operation \u step 2 can be replaced by a 
right shift of two positions. Finally, if you 
think of R*5 as R*4+R, then it's easy to see 
how to reduce that multiply operation to 
left shift and addition operations. 

Let's look at the sequence generated by 



this rehash routine. If our table is eight 
entries long and the initial hash index is, let's 
say, 4, then R takes on the values 
1,6,7,4,5,2,3,1, so the table would be 
searched in the order 4(initial index), 
5,2,3,0,1,6,7. How does this avoid the clus- 
tering situation? If we chose another initial 
index, say, 5, then the table is searched in 
the order 5(initial index),6,3,4,1, 2,7,0. As 
you see, the entry searched after entry 5 will 
depend upon the initial index. If the initial 
index was 4, then 2 is searched after 5; but if 
the initial index was 5, then 6 follows 5. in a 
linear search, 6 always follows 5. This 
dependence upon the initial index is what 
avoids the clustering. 



Quadratic Rehash 

A second class of algorithms for rehashing 
is the quadratic rehash and these are based 
upon a quadratic function. The major draw- 
back with most algorithms in this class is 
that they search only one half of the table, 
so two different rehashing algorithms are 
required. The most efficient quadratic re- 
hash, and one which does search the entire 
table, was first introduced by Colin Day [see 
bibliography, reference 7/. Day's algorithm 
can only be applied to a table whose size is a 
prime number that produces a remainder of 
1 when it is divided by 4 (eg: 5=4*1+1, 
401=4*100+1). At first glance, thk; appears to 
place a great many restrictions on the 
allowable size of the table; but don't despair, 
because experience will show that a number 
satisfying the required condition can be 
found very near any desired value. Be certain 
that you use an acceptable number or the 
procedure will not search all locations of the 
table. Like the last rehashing function an 
internal variable is used. The variable, R, 
must be preset to (— N) whenever the func- 
tion HASH is called. The quadratic rehash 
process is (remember that the mod operation 
is just a conditional subtraction): 



[7] 



REHASH(U): 

1. R:-R + 2; 

2. REHASH := 



{1+ |R|) [modNl; 



If we look at the sequence generated by 
this procedure, we see that R takes on the 
values (for a table of size 1 1 =4*2+3) 
-1 1 ,-9,-7,-5,-3,-1 ,1 ,3,5,7,9,1 1 . There- 
fore, if the initial index were 4 the table 
would be searched in the order: 4(initial 
index), 2,9,3,6,7,8,0,5,1,10. One major dif- 
ference between this algorithm and the 
random rehash is that this one calculates the 
next index based on the previous one. The 
random rehash calculates the next index 
based on the initial index. 
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Weighted Increment Rehash 

The last, and probably the simplest, 
method for performing the rehash is called a 
weighted increment [see bibliography, 
reference 2] . This one is unique because it 
uses the hash index to calculate an incre- 
ment which is in turn used to step through 
the table. The table size is again restricted to 
a power of 2, and whenever the function 
HASH is called, the variable R is preset to 
(2*J+1)[mod N], where J is the initial hash 
index. The weighted increment method is: 

[8] REHASH0,J): 

1. REHASH :={I+R) [mod N] ; 

This process is very much like a linear 
rehash. In fact if R were always set to 1 it 
would be a linear rehash; however R depends 
on the hash index. If our table is eight 
entries long and the initial index is 5 then 
R=2*5+1[mod 8] =11 -8=3 and the table 
items are searched in the order 
5,0,3,6,1,4,7,2. Since the increment is a 
constant for any particular hash index, we 
can improve the basic hash algorithm when 
using this rehash technique. You will notice 
that all memory references are of the form 
(TABLE-H*B), where B is the number of 
bytes. We can avoid that multiply by in- 
cluding it in the computation of R. If we let 
R=((2*J+1)[mod N])*B, then all of the 
table references become (TABLE-H). If we 
also initialize I to TABLE+HASH(KEY) we 
can make all references as just (I), 

Laying Doubts to Rest 

You might conceivably ask, "What is 
gained by using a complex rehashing func- 
tion?"; or if you're one of the more cynical 
observers, "Why use hashing at all?", in an 
attempt to answer these questions, a simple 
experiment was performed. First a table of 
approximately 500 entries was filled with 
randomly generated entries and then each 
entry was located in the table using the 
lookup technique under test. This simple 
experiment provides an insight into the 
comparative efficiency of table lookup algo- 
rithms. Table 1 summarizes the results of the 
experiment. This data clearly illustrates that 
there is significant improvement in table 
lookup time when hashing is utilized. Fur- 
thermore, when a complex rehashing algo- 
rithm is incorporated in the search pro- 
cedure, the statistics are again improved. It is 
worth noting again that, although the num- 
ber of tests for a sorted table is not 
tremendously large, the approach is very 
inefficient if the table must be accessed 
before being filled with entries. 

One other surprising fact about the aver- 
age search length (the number of tests 



required) for hash accessed tables is that it 
does not depend upon the length of the 
table. Rather, the search length depends 
only upon the load factor or the percentage 
of occupied items in the table. This means 
that you can expect the average search time 
for a table of size 1 0,000 to be about the 
same as the search time for a table of size 
500! This is surely not the case with the 
linear or sorted search. While the average 
linear search length skyrockets to 4,500 (for 
a 90% full table of size 10,000), the average 
hash search length remains at less than six! 
Although table 1 seems to indicate that 
the weighted increment is most efficient, we 
must be careful not to read too much into 
these results. The statistics in table 1 were 
obtained using randomly generated keys in 
the test program. When actual keys are used 
the search statistics will vary somewhat 
because actual keys are rarely perfectly 
random. For example, the search length for 
a weighted increment search is adversely 
effected by bit patterns in the key. The best 
way to insure that you are using the most 
optimal search procedure is to repeat the 
experiment with a sample of actual keys. If a 
finely tuned algorithm is not important, 
then the weighted increment is probably the 
better choice because it is simple and can be 
applied to any format of table. As we will 
see shortly, most of the algorithms work 
best if the table is rearranged in memory. 



Application 

There are a number of "tricks" which can 
be used to improve efficiency. A number of 
them have already been mentioned. Through- 
out our discussion we have assumed that 
each table entry occupies more than a 
single byte. If each table entry is B bytes 
long, then the typical memory reference is 
(TABLE+PB). It would be desirable to 
eliminate or at least reduce the multiply 
operation. We already discussed how to 
eliminate the multiply if a weighted incre- 
ment rehash is used. Another method to 
eliminate a multiply is table reorganization. 

All of the tables discussed so far were 
horizontally organized. This means that the 
items are stored as shown in figure 5. This is 
the most common table organization. An 
alternative organization is a vertical organiza- 
tion such as in figure 6. If you have 
organized your table vertically then the first 
byte of an item is addressed by (TABLE-H) 
and the multiply is gone. All of the other 
bytes in the item are addressed by 
(BYTEN+I) where BYTEN is the address of 
the n th byte of the first item. Thus by 
organizing the data vertically we eliminate a 
multiply operation. This vertical arrange- 
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ment is practical from other aspects also. 
Consider searching the table for all items 
containing a specific value in the third byte. 
Since the third byte of each item is stored 
sequentially this search operation is 
simplified. 

Conclusion 

We have tried to show that hashing is not 
nearly as complicated as you might have 
thought. By using these techniques perhaps 
you can regain a valuable slice of your 
microprocessor's processing toad.^ 



GLOSSARY 



Clustering: Grouping of elements within a table 
caused by equal hash indices. 

Collision: Two elements with the same hash index. 

Direct access hash: A hash algorithm which pre- 
cludes collision. That is, no two elements have 
identical hash indices. 

Disassembler: A program to translate object code 
to assembly language, inverse of an assembler. 

Folding: Procedure for randomizing the hash 
index. The upper and lower half of the key are 
added together before the index is calculated. 

Horizontal table: A table whose entries are stored 
sequentially. That is, {entry one, byte one), (entry 
one, byte two), etc. 

Hash index: The initial estimate of the location of 
an entry within the table. 

Hashing: A nonlinear algorithm for storing/ 
retrieving data from a table. 

Hashing function: The algorithm or procedure for 
calculating the hash index. 

Key: Field within an entry that is used to locate 
the entry. For example, surnames are the key field 
of the entries of a telephone directory. 

Linear rehash: A method for resolving collisions. 



The table is searched sequentially from the point 
of impact. 

Linear search: Table search which examines each 
item starting with the first item and proceeding 
sequentially. 

MOD: Remainder of one number divided by 
another. That is, X MOD Y is the remainder of 
X/Y. 

Pseudorandom rehash: A method for resolving 
collisions. A nonrepeating random number genera- 
tor is used to determine the next entry to be 
searched. 

Quadratic prime: A prime number which produces 
a remainder of 3 when divided by 4. 

Quadratic rehash: A method for resolving col- 
lisions. A quadratic or second degree function is 
used to determine the next entry to be searched. 

Rehash: Any algorithm for resolving collisions. 

Squaring: Procedure for randomizing the hash 
index. The key is multiplied by itself before the 
hash index is computed. 

Vertical table: A table where the bytos of each 
entry are stored sequentially. That is (cmtry one, 
byte one), {entry two, byte one), etc. FORTRAN 
stores arrays in this manner. 

Weighted increment rehash: A method for resolv- 
ing collisions. The hash index is used to determine 
the next entry to be searched. 
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Improving Quadratic Rehash 



"Making Hash With Tables" by Terry 
Dollhoff [BYTE, January 1977, page 78 J 
is a good introduction to hash tables. How- 
ever, quadratic methods for collision avoid- 
ance do not have to be complicated or suffer 
from "half table search." If the table length 
is a power of 2 and the quadratic increment 
is 3 as in the following simple and fast algo- 
rithm, then none of the table will be ex- 
cluded from the search. 

I have been using this scheme since about 
1970 but have never seen it reported in the 
literature. Your readers may write to me 
for a copy of the proof that it works.s 



John F Herbster 



A Quadratic Hash Table 

The following algorithm assumes that 
the table length is a power of 2. the table 
words were initialized to VIRGIN, and 
MASK has a value equal to the table length 
minus 1. 

1. Set DELtoO. 

2. Set I to hash code of KEY. 

3. Let i=I.AND.MASK (te: AND 1 with 
MASK). 

4. If TABLE (IM/IRGIN then go to 
NOTFOUND, (Note that TABLE(I) 
refers to the contents of location 
TABLE+I). 

5. If TABLE{I)=KEY then go to 
FOUND. 

6. LetDEL=(DEL-K5hAND.MASK. 

7. If DEL-0 then go to FULL. (Note 
that DEL gets back to only after 
the whole table has been searched.) 

8. Letl=(l+DEL).AND.MASK. 

9. Go to step 4. 

On return to the user's program via the 
NOTFOUND, or FOUND exits, the index, I, 
will point to the spot for a new table entry 
or the found entry respectively. The FULL 
return means the KEY was not found and 
that the table is full. Note that the value 
VIRGIN may not equal any possible value 
of KEY. 



l page 84 in this edition 
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The Care and Feeding 

of Binary Trees 



Cam Fame!! 



Many computer applications require 
sorting, searching, or both. While hashing is a 
useful tool for maintaining tables, it does 
not lend itself to particularly dense tables 
and is of no use in sorting. This article 
describes the use of binary trees to perform 
both sorting and searching at high speed 
with modest overhead. 

Although numerous applications require 
sorting and searching, a good example that 
requires both is the label table in an assem- 
bler. During assembly, fast access is required 
each time a label is referenced, and, at the 
end, the table must be sorted if it is to be 
listed in alphabetical order. 

Like all things, binary trees have advan- 
tages and disadvantages. They deal best with 
large amounts of data encountered in 
random order. This property makes them 
ideal for use in assemblers and other applica- 
tions with similar data. Binary trees provide 
a method which is just the opposite of 
sequential searching or bubble sorting, which 
are at their best with small amounts of data. 



trees right side up. In an effort to please ali 
and offend none, trees are represented here 
on their sides. This has the advantage that 
up and down references, when applied to 
pointers, actually mean up and down in the 
figure, which is not true with the other 
representations. 

A simple tree structure following our 
example would appear as in figure 3. For 
simplicity, only the label and the pointers 
have been shown. This tree is already built; 
we'll get to the logic on how to build one 
shortly. 



Figure 7. Possible data arrangement for label 
entries for an assembler label table. 
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Recognizing Binary Trees 

The term 'binary tree* refers to a method 
of linking data records together in memory. 
Typically the records remain in memory in 
the same order in which they were encoun- 
tered. Pointers are used to link them together 
in an ordered manner. The creation and use 
of these pointers is the heart of a binary tree. 

In an assembler, for example, each label 
entry might contain the label itself, a one 
byte flag and a 16 bit value, as shown in 
figure 1. in order to be used as a binary tree, 
each entry must include two address pointers 
of 16 bits each, for a total of four bytes. 
These pointers are called up and down 
pointers since they will be used to point to 
higher and lower entries in the tree (more 
about that later). Our example now looks 
like figure 2. 

In most discussions of trees, they are 
called 'trees' but are drawn upside down like 
roots. Others say this is silly and draw their 



Figure 2 To use the binary tree structure, 
each entry must also have two pointers to 
point to the entry before it and the one 
after it 
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Figure 3. Simple binary tree structure. The 
first label encountered is MA T This label 
leads up to GO or down to RUN, both of 
which end the tree. 
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The pointers link the tree together. The 
up pointer in each entry points to the rest of 
the tree which is above the current item 
while the down pointer points to those items 
below the current entry. 

Null pointers are pointers which don't 
point anywhere. These are shown in the 
diagram as asterisks (*) and are usually 
represented in memory as zero. Null pointers 
indicate that a particular entry is the end of 
a branch. 

Searching a Binary Tree 

To search for an item in a binary tree, 
start with the first item (MAT in the exam- 
ple) and compare it to the item being looked 
for. If the item being searched is above the 
current entry in the sort sequence follow 
the up pointer, while if it is below, follow 
the down pointer. The procedure is repeated 
until either a match is found or a null 
pointer is encountered. The latter case 
indicates that the item searched for isn't in 
the tree. The logic for searching such a tree 
is shown in listing 1. 

To search the tree for the word GO, start 
at MAT and perform a comparison. Since 
GO is above MAT, take the up pointer from 
MAT which points to GO, producing a 
match and stopping the search. 

Growing Binary Trees 

Once we have the routine to search a tree, 
adding items is easy. To add an item to the 
tree search it first to make sure the item isn't 
already there. Using the example above, first 
search the tree in order to add the label 
NUM. Comparing with MAT indicates that 
NUM is below, so follow the down pointer 
which points to RUN. Comparing with 



{search pointer = address of first tree entry) 
DO UNT1 L (current entry - required entry) 
IF {current entry > required entry) THEN 



IF (current up-pointer ~ 0) THEN 

(search pointer - address of current up-pointer) 
(return signaling 'not found') 

(search pointer = current up-pointer) 



ELSE 

ENDIF 
ELSE 
IF 



ELSE 



ENDIF 
ENDIF 
ENDDO 
(return signaling 'found') 



(current down-pointer - 0) THEN 

(search pointer = address of current down-pointer) 

(return signaling 'not found') 

{search pointer = current down-pointer) 



Listing 7. Binary tree search iogic. This logicai routine wiii search the entire 
binary tree untii the iooked for iabei is found or a nuii pointer is encountered. 
This listing expresses the iogic in a "pseudo code"' language. 



RUN indicates that NUM is above RUN, so 
follow the up pointer from RUN. But since 
the up pointer is a null pointer there is no 
match and the search is terminated. This 
null pointer, however, is the one that should 
point to our new entry. To add NUM simply 
build an entry for it at the end of the table 
(with both its pointers set to null) and then 
adjust the null up pointer from RUN to 
point to the new entry for NUM. The tree is 
now as shown in figure 4. The logic for 
adding a new item to the tree is shown in 
listing 2. 

The procedure given above tells how to 
add items to an existing tree but doesn't tell 
how to get the tree started in the first place. 
To start a tree, take the first item to be 
added to the tree and build an entry for it 
with both its pointers set to null. Since it is 
the first item in the tree there are no poin- 
ters to it. 



GO 
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L 



Figure 4. Inserting another entry to the 
binary tree. The up pointer from RUN is 
now pointing to the new iabei which is lower 
than MA T but higher than R UN. 



Sorting with Binary Trees 

So much for searching. What about 
sorting? Although it may not appear so, 
once a tree has been built it is actually 
sorted as well. This can be seen by placing a 
piece of paper over figure 4 and slowly 
sliding it down. The labels will appear in 
sequence because the diagram shows them in 
logical order. Reading them back from the 
computer's memory isn't quite this simple, 
since the logical order is given by the poin- 
ters rather than by physical order, but it's 
not very hard either. 

The logic to read back a tree in order 
consists, in a nutshell, of performing the 
instructions in table 1. 

The complete algorithm for this process 
is shown in listing 3. In order to remember 
the path followed a stack is used. Each stack 
entry requires 3 bytes: 2 for a 16 bit pointer 
and one for a flag to indicate if the path was 
via an up or down pointer. 
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(call the search routine) 
IF (the item was found) THEN 
(return signaling 'duplicate') 
ENDIF 

(build an entry for the new item) 
(get the pointer left by the search routine) 
(use this pointer to store the address of the new entry, 
(. . . over the null pointer that terminated the search) 
(return signaling completion) 



Listing 2. Binary tree da to insertion logic. This routine uses the search routine 
to find if the item is ai ready in the tree, if it is not, the search routine returns 
with the pointer indicating the nufi pointer that ended the search. The new 
entry wiii be added to the tree at this position. 



(search pointer = address of first tree entry) 
DO FOREVER 

I F (current up-pointer = 0) TH EN 
(print the current entry) 
DO WH I LE {current down-pointer = 0} 
IF (stack is empty) THEN 

(return) 
ENDIF 

(search pointer * top pointer from stack) 
(search flag = top flag from stack) 
(pop the stack) 
DO WHILE (search flag - TT) 
IF (stack is empty) THEN 

(return) 
ENDIF 

(search pointer = top pointer from stack) 
(search flag - top fiag from stack) 
(pop the stack) 
ENDDO 

(print the current entry) 
ENDDO 
(push the stack) 

(top stack pointer - search pointer) 
(top stack flag = 'D'} 
(search pointer = current down-pointer) 
ELSE 

(push the stack) 

(top stack pointer = search pointer) 
(top stack flag - 'U'} 
(search pointer - current up-pointer) 
ENDIF 
ENDDO 



Listing 3. This fist of instructions is the procedure that is fotfowed when 
sorting a binary tree. 



Start at the root of the tree ('MAT' in the example). 

If there is an up pointer, follow it and make a record of the path followed. 

When returning from following the up pointer (or if the up pointer was null), print 

the current entry, then follow the down pointer and make a record of the path. 

When both pointers have been processed (or if the down pointer is null) back up to 

the previous entry, 

The sort is done when an attempt is made to back up to the previous entry and there 

is no previous entry found. 



Table 7. Instructions for carrying out a sort of a binary tree. 



• Start and the beginning of the tree (MAT). 

• MAT has an up pointer, so follow it, pushing the address of MAT (the entry being 
come from) and a U flag (to indicate that the path followed an up pointer) onto the 
stack. 

• The entry being pointed to now is GO. 

• GO has a null up pointer which requires no action. 

• Having processed GO's up pointer, GO is printed. 

• GO also has a null down pointer which requires no action. 

• Since both of GO's pointers have been processed, the stack is popped. 

• The entry now being pointed to is MAT, 

• Since a U flag was popped, the up pointer (but not the down pointer) has been 
examined; therefore MAT is printed. 

• MAT has a valid down pointer, so it is followed and the address of MAT and a D flag 
(since the path was via a down pointer) are pushed onto the stack. 

• The entry being pointed to is now RUN, 

• RUN has a valid up pointer, so the address of RUN and a U flag are pushed onto the 
stack, 

• The entry being pointed to is NUM. 

• Since both of the pointers are null, NUM is printed and the stack is popped. 

• The entry being pointed to is now RUN. 

• Since RUN's up pointer has already been processed, RUN is printed. 

• RUN's down pointer is null so the stack is again popped. 

• MAT is now being pointed to. 

• Having examined both the up and down pointers of MAT, an attempt is made to pop 
the stack, 

• Since the stack is empty, it cannot be popped. This signals that all processing is 
complete. 



Table 2. This is a trace of the procedure for reading back and printing the 
example tree of figure 4 using the logic routines of listing 3. 



To illustrate this, table 2 is a trace of the 
procedure necessary to read back and print 
the example tree from figure 4. 

This procedure will work for a tree of any 
size as long as there is enough room for the 
stack. The maximum number of stack entries 
required during the sorted readback depends 
on the order in which the data is placed in 
the tree. If the data comes in in random 
order, then the number of stack entries will 
not be much greater than the base 2 logarithm 
of the number of entries. For example, a 
stack with a depth of 16 should handle a 
tree containing up to about 64,000 entries. 
The worst case for stack depth occurs when 
the data was already sorted, or reverse sorted, 
when read in. This case will require as many 
stack entries as there are items in the tree. 

Optimizing Binary Trees 

If the tree routines use the lower 32 K 
bytes of address space, the sort readback 
stack can be reduced to two bytes per entry 
by using the high order address bit in place 
of the up or down flag. If this is done, the 
high order bit will probably have to be 
masked off before addresses from the stack 
are used to access memory. 

It is possible to balance a binary tree as it 
is being built. Such a tree will always require 
a minimum number of comparisons for a 
search and a minimum amount of stack 
depth during sort readback. On the other 
hand, balancing a tree requires two more 
bits per entry (hence probably an extra 
byte) and is quite complex. The balancing 
algorithm is too complicated to include here; 
however, a complete description can be 
found in The Art of Computer Programming, 
Volume 3 by Donald Knuth." 
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to provide the personal com- 
puter owner with some of the 
background information neces- 
sary to write and maintain 
programs effectively. 

The series is designed to en- 
hance the state of the personal 
computing art by encouraging 
readily understood, easily cus- 
tomized and fully designed 
programs. 

This book introduces the sub- 
ject of program design. The 
most critical part of developing 
a program is the design phase 
Here most fatal errors are in- 
troduced and program specifi- 
cations forgotten, it is also 
during this phase that errors 
are least costly to fix (both in 
terms of money and time), 
specifications are easiest to 
change, and program integrity 
simplest to insure. And this is 
true whether the program is 
the latest operating system or 
the newest computer game. 
This volume will help the per- 
sonal computer user looking 
for a better way to design. 

Other books in the Program Techniques series will include: 
Simulation ISBN 0-07-037826-6 

Numbers in Theory and Practice ISBN 0-07-037827-4 
Bits and Pieces ISBN 0-07-037828-2 
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